Models¶
Create Model¶
mindcv.models.model_factory.create_model(model_name, num_classes=1000, pretrained=False, in_channels=3, checkpoint_path='', ema=False, auto_mapping=False, **kwargs)
¶
Creates model by name.
| PARAMETER | DESCRIPTION |
|---|---|
model_name |
The name of model.
TYPE:
|
num_classes |
The number of classes. Default: 1000.
TYPE:
|
pretrained |
Whether to load the pretrained model. Default: False.
TYPE:
|
in_channels |
The input channels. Default: 3.
TYPE:
|
checkpoint_path |
The path of checkpoint files. Default: "".
TYPE:
|
ema |
Whether use ema method. Default: False.
TYPE:
|
auto_mapping |
Whether to automatically map the names of checkpoint weights to the names of model weights when there are differences in names. Default: False.
TYPE:
|
**kwargs |
additional args, e.g., "features_only", "out_indices".
DEFAULT:
|
Source code in mindcv\models\model_factory.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | |
bit¶
mindcv.models.bit
¶
MindSpore implementation of BiT_ResNet.
Refer to Big Transfer (BiT): General Visual Representation Learning.
mindcv.models.bit.BiT_ResNet
¶
Bases: Cell
BiT_ResNet model class, based on
"Big Transfer (BiT): General Visual Representation Learning" <https://arxiv.org/abs/1912.11370>_
Args:
block(Union[Bottleneck]): block of BiT_ResNetv2.
layers(tuple(int)): number of layers of each stage.
wf(int): width of each layer. Default: 1.
num_classes(int): number of classification classes. Default: 1000.
in_channels(int): number the channels of the input. Default: 3.
groups(int): number of groups for group conv in blocks. Default: 1.
base_width(int): base width of pre group hidden channel in blocks. Default: 64.
norm(nn.Cell): normalization layer in blocks. Default: None.
Source code in mindcv\models\bit.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 | |
mindcv.models.bit.BiT_ResNet.forward_features(x)
¶
Network forward feature extraction.
Source code in mindcv\models\bit.py
247 248 249 250 251 252 253 | |
mindcv.models.bit.Bottleneck
¶
Bases: Cell
define the basic block of BiT Args: in_channels(int): The channel number of the input tensor of the Conv2d layer. channels(int): The channel number of the output tensor of the middle Conv2d layer. stride(int): The movement stride of the 2D convolution kernel. Default: 1. groups(int): Number of groups for group conv in blocks. Default: 1. base_width(int): Base width of pre group hidden channel in blocks. Default: 64. norm(nn.Cell): Normalization layer in blocks. Default: None. down_sample(nn.Cell): Down sample in blocks. Default: None.
Source code in mindcv\models\bit.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | |
mindcv.models.bit.StdConv2d
¶
Bases: Conv2d
Conv2d with Weight Standardization Args: in_channels(int): The channel number of the input tensor of the Conv2d layer. out_channels(int): The channel number of the output tensor of the Conv2d layer. kernel_size(int): Specifies the height and width of the 2D convolution kernel. stride(int): The movement stride of the 2D convolution kernel. Default: 1. pad_mode(str): Specifies padding mode. The optional values are "same", "valid", "pad". Default: "same". padding(int): The number of padding on the height and width directions of the input. Default: 0. group(int): Splits filter into groups. Default: 1.
Source code in mindcv\models\bit.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
mindcv.models.bit.BiT_resnet101(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 101 layers ResNet model.
Refer to the base class models.BiT_Resnet for more details.
Source code in mindcv\models\bit.py
298 299 300 301 302 303 304 305 306 307 308 309 | |
mindcv.models.bit.BiT_resnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 50 layers ResNet model.
Refer to the base class models.BiT_Resnet for more details.
Source code in mindcv\models\bit.py
270 271 272 273 274 275 276 277 278 279 280 281 | |
mindcv.models.bit.BiT_resnet50x3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 50 layers ResNet model.
Refer to the base class models.BiT_Resnet for more details.
Source code in mindcv\models\bit.py
284 285 286 287 288 289 290 291 292 293 294 295 | |
cait¶
mindcv.models.cait
¶
MindSpore implementation of CaiT.
Refer to Going deeper with Image Transformers.
mindcv.models.cait.AttentionTalkingHead
¶
Bases: Cell
Talking head is a trick for multi-head attention, which has two more linear map before and after the softmax compared to normal attention.
Source code in mindcv\models\cait.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
cmt¶
mindcv.models.cmt
¶
mindcv.models.cmt.PatchEmbed
¶
Bases: Cell
Image to Patch Embedding
Source code in mindcv\models\cmt.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | |
mindcv.models.cmt.cmt_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
CMT-Base
Source code in mindcv\models\cmt.py
441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 | |
mindcv.models.cmt.cmt_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
CMT-Small
Source code in mindcv\models\cmt.py
424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 | |
mindcv.models.cmt.cmt_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
CMT-tiny
Source code in mindcv\models\cmt.py
390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 | |
mindcv.models.cmt.cmt_xsmall(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
CMT-XSmall
Source code in mindcv\models\cmt.py
407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 | |
coat¶
mindcv.models.coat
¶
CoaT architecture. Modified from timm/models/vision_transformer.py
mindcv.models.coat.CoaT
¶
Bases: Cell
CoaT class.
Source code in mindcv\models\coat.py
442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 | |
mindcv.models.coat.ConvPosEnc
¶
Bases: Cell
Convolutional Position Encoding. Note: This module is similar to the conditional position encoding in CPVT.
Source code in mindcv\models\coat.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 | |
mindcv.models.coat.FactorAtt_ConvRelPosEnc
¶
Bases: Cell
Factorized attention with convolutional relative position encoding class.
Source code in mindcv\models\coat.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
mindcv.models.coat.Mlp
¶
Bases: Cell
MLP Cell
Source code in mindcv\models\coat.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
mindcv.models.coat.ParallelBlock
¶
Bases: Cell
Parallel block class.
Source code in mindcv\models\coat.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 | |
mindcv.models.coat.ParallelBlock.downsample(x, output_size, size)
¶
Feature map down-sampling.
Source code in mindcv\models\coat.py
339 340 341 | |
mindcv.models.coat.ParallelBlock.interpolate(x, output_size, size)
¶
Feature map interpolation.
Source code in mindcv\models\coat.py
343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 | |
mindcv.models.coat.ParallelBlock.upsample(x, output_size, size)
¶
Feature map up-sampling.
Source code in mindcv\models\coat.py
335 336 337 | |
mindcv.models.coat.PatchEmbed
¶
Bases: Cell
Image to Patch Embedding
Source code in mindcv\models\coat.py
400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 | |
mindcv.models.coat.SerialBlock
¶
Bases: Cell
Serial block class. Note: In this implementation, each serial block only contains a conv-attention and a FFN (MLP) module.
Source code in mindcv\models\coat.py
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 | |
convit¶
mindcv.models.convit
¶
MindSpore implementation of ConViT.
Refer to ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
mindcv.models.convit.Block
¶
Bases: Cell
Basic module of ConViT
Source code in mindcv\models\convit.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
mindcv.models.convit.ConViT
¶
Bases: Cell
ConViT model class, based on '"Improving Vision Transformers with Soft Convolutional Inductive Biases" https://arxiv.org/pdf/2103.10697.pdf'
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
image_size |
images input size. Default: 224.
TYPE:
|
patch_size |
image patch size. Default: 16.
TYPE:
|
embed_dim |
embedding dimension in all head. Default: 48.
TYPE:
|
num_heads |
number of heads. Default: 12.
TYPE:
|
drop_rate |
dropout rate. Default: 0.
TYPE:
|
drop_path_rate |
drop path rate. Default: 0.1.
TYPE:
|
depth |
model block depth. Default: 12.
TYPE:
|
mlp_ratio |
ratio of hidden features in Mlp. Default: 4.
TYPE:
|
qkv_bias |
have bias in qkv layers or not. Default: False.
TYPE:
|
attn_drop_rate |
attention layers dropout rate. Default: 0.
TYPE:
|
locality_strength |
determines how focused each head is around its attention center. Default: 1.
TYPE:
|
local_up_to_layer |
number of GPSA layers. Default: 10.
TYPE:
|
use_pos_embed |
whether use the embeded position. Default: True.
TYPE:
|
locality_strength(float) |
the strength of locality. Default: 1.
|
Source code in mindcv\models\convit.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 | |
mindcv.models.convit.convit_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConViT base model Refer to the base class "models.ConViT" for more details.
Source code in mindcv\models\convit.py
398 399 400 401 402 403 404 405 406 407 408 409 410 | |
mindcv.models.convit.convit_base_plus(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConViT base+ model Refer to the base class "models.ConViT" for more details.
Source code in mindcv\models\convit.py
413 414 415 416 417 418 419 420 421 422 423 424 425 | |
mindcv.models.convit.convit_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConViT small model Refer to the base class "models.ConViT" for more details.
Source code in mindcv\models\convit.py
368 369 370 371 372 373 374 375 376 377 378 379 380 | |
mindcv.models.convit.convit_small_plus(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConViT small+ model Refer to the base class "models.ConViT" for more details.
Source code in mindcv\models\convit.py
383 384 385 386 387 388 389 390 391 392 393 394 395 | |
mindcv.models.convit.convit_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConViT tiny model Refer to the base class "models.ConViT" for more details.
Source code in mindcv\models\convit.py
338 339 340 341 342 343 344 345 346 347 348 349 350 | |
mindcv.models.convit.convit_tiny_plus(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConViT tiny+ model Refer to the base class "models.ConViT" for more details.
Source code in mindcv\models\convit.py
353 354 355 356 357 358 359 360 361 362 363 364 365 | |
convnext¶
mindcv.models.convnext
¶
MindSpore implementation of ConvNeXt and ConvNeXt V2.
Refer to: A ConvNet for the 2020s
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
mindcv.models.convnext.Block
¶
Bases: Cell
ConvNeXt Block There are two equivalent implementations: (1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W) (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back Unlike the official impl, this one allows choice of 1 or 2, 1x1 conv can be faster with appropriate choice of LayerNorm impl, however as model size increases the tradeoffs appear to change and nn.Linear is a better choice. This was observed with PyTorch 1.10 on 3090 GPU, it could change over time & w/ different HW. Args: dim: Number of input channels. drop_path: Stochastic depth rate. Default: 0.0. layer_scale_init_value: Init value for Layer Scale. Default: 1e-6.
Source code in mindcv\models\convnext.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
mindcv.models.convnext.ConvNeXt
¶
Bases: Cell
ConvNeXt and ConvNeXt V2 model class, based on
"A ConvNet for the 2020s" <https://arxiv.org/abs/2201.03545>_ and
"ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders" <https://arxiv.org/abs/2301.00808>_
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
dim of the input channel.
TYPE:
|
num_classes |
dim of the classes predicted.
TYPE:
|
depths |
the depths of each layer.
TYPE:
|
dims |
the middle dim of each layer.
TYPE:
|
drop_path_rate |
the rate of droppath. Default: 0.0.
TYPE:
|
layer_scale_init_value |
the parameter of init for the classifier. Default: 1e-6.
TYPE:
|
head_init_scale |
the parameter of init for the head. Default: 1.0.
TYPE:
|
use_grn |
If True, use Global Response Normalization in each block. Default: False.
TYPE:
|
Source code in mindcv\models\convnext.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 | |
mindcv.models.convnext.ConvNextLayerNorm
¶
Bases: LayerNorm
LayerNorm for channels_first tensors with 2d spatial dimensions (ie N, C, H, W).
Source code in mindcv\models\convnext.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
mindcv.models.convnext.GRN
¶
Bases: Cell
GRN (Global Response Normalization) layer
Source code in mindcv\models\convnext.py
65 66 67 68 69 70 71 72 73 74 75 76 77 | |
mindcv.models.convnext.convnext_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt base model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
287 288 289 290 291 292 293 294 295 296 | |
mindcv.models.convnext.convnext_large(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt large model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
299 300 301 302 303 304 305 306 307 308 | |
mindcv.models.convnext.convnext_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt small model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
275 276 277 278 279 280 281 282 283 284 | |
mindcv.models.convnext.convnext_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt tiny model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
263 264 265 266 267 268 269 270 271 272 | |
mindcv.models.convnext.convnext_xlarge(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt xlarge model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
311 312 313 314 315 316 317 318 319 320 | |
mindcv.models.convnext.convnextv2_atto(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt_v2 atto model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
323 324 325 326 327 328 329 330 331 | |
mindcv.models.convnext.convnextv2_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt_v2 base model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
378 379 380 381 382 383 384 385 386 | |
mindcv.models.convnext.convnextv2_femto(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt_v2 femto model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
334 335 336 337 338 339 340 341 342 | |
mindcv.models.convnext.convnextv2_huge(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt_v2 huge model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
400 401 402 403 404 405 406 407 408 | |
mindcv.models.convnext.convnextv2_large(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt_v2 large model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
389 390 391 392 393 394 395 396 397 | |
mindcv.models.convnext.convnextv2_nano(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt_v2 nano model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
356 357 358 359 360 361 362 363 364 | |
mindcv.models.convnext.convnextv2_pico(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt_v2 pico model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
345 346 347 348 349 350 351 352 353 | |
mindcv.models.convnext.convnextv2_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ConvNeXt_v2 tiny model. Refer to the base class 'models.ConvNeXt' for more details.
Source code in mindcv\models\convnext.py
367 368 369 370 371 372 373 374 375 | |
crossvit¶
mindcv.models.crossvit
¶
MindSpore implementation of crossvit.
Refer to crossvit: Cross-Attention Multi-Scale Vision Transformer for Image Classification
mindcv.models.crossvit.PatchEmbed
¶
Bases: Cell
Image to Patch Embedding
Source code in mindcv\models\crossvit.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
mindcv.models.crossvit.VisionTransformer
¶
Bases: Cell
Vision Transformer with support for patch or hybrid CNN input stage
Source code in mindcv\models\crossvit.py
311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 | |
densenet¶
mindcv.models.densenet
¶
MindSpore implementation of DenseNet.
Refer to: Densely Connected Convolutional Networks
mindcv.models.densenet.DenseNet
¶
Bases: Cell
Densenet-BC model class, based on
"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>_
| PARAMETER | DESCRIPTION |
|---|---|
growth_rate |
how many filters to add each layer (
TYPE:
|
block_config |
how many layers in each pooling block. Default: (6, 12, 24, 16).
TYPE:
|
num_init_features |
number of filters in the first Conv2d. Default: 64.
TYPE:
|
bn_size |
multiplicative factor for number of bottleneck layers (i.e. bn_size * k features in the bottleneck layer). Default: 4.
TYPE:
|
drop_rate |
dropout rate after each dense layer. Default: 0.
TYPE:
|
in_channels |
number of input channels. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
Source code in mindcv\models\densenet.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
mindcv.models.densenet.densenet121(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 121 layers DenseNet model.
Refer to the base class models.DenseNet for more details.
Source code in mindcv\models\densenet.py
225 226 227 228 229 230 231 232 233 234 235 236 | |
mindcv.models.densenet.densenet161(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 161 layers DenseNet model.
Refer to the base class models.DenseNet for more details.
Source code in mindcv\models\densenet.py
239 240 241 242 243 244 245 246 247 248 249 250 | |
mindcv.models.densenet.densenet169(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 169 layers DenseNet model.
Refer to the base class models.DenseNet for more details.
Source code in mindcv\models\densenet.py
253 254 255 256 257 258 259 260 261 262 263 264 | |
mindcv.models.densenet.densenet201(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 201 layers DenseNet model.
Refer to the base class models.DenseNet for more details.
Source code in mindcv\models\densenet.py
267 268 269 270 271 272 273 274 275 276 277 278 | |
dpn¶
mindcv.models.dpn
¶
MindSpore implementation of DPN.
Refer to: Dual Path Networks
mindcv.models.dpn.BottleBlock
¶
Bases: Cell
A block for the Dual Path Architecture
Source code in mindcv\models\dpn.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
mindcv.models.dpn.DPN
¶
Bases: Cell
DPN model class, based on
"Dual Path Networks" <https://arxiv.org/pdf/1707.01629.pdf>_
| PARAMETER | DESCRIPTION |
|---|---|
num_init_channel |
int type, the output channel of first blocks. Default: 64.
TYPE:
|
k_r |
int type, the first channel of each stage. Default: 96.
TYPE:
|
g |
int type,number of group in the conv2d. Default: 32.
TYPE:
|
k_sec |
multiplicative factor for number of bottleneck layers. Default: 4.
TYPE:
|
inc_sec |
the first output channel in each stage. Default: (16, 32, 24, 128).
TYPE:
|
in_channels |
int type, number of input channels. Default: 3.
TYPE:
|
num_classes |
int type, number of classification classes. Default: 1000.
TYPE:
|
Source code in mindcv\models\dpn.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
mindcv.models.dpn.DualPathBlock
¶
Bases: Cell
A block for Dual Path Networks to combine proj, residual and densely network
Source code in mindcv\models\dpn.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
mindcv.models.dpn.dpn107(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 107 layers DPN model.
Refer to the base class models.DPN for more details.
Source code in mindcv\models\dpn.py
304 305 306 307 308 309 310 311 312 313 314 315 | |
mindcv.models.dpn.dpn131(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 131 layers DPN model.
Refer to the base class models.DPN for more details.
Source code in mindcv\models\dpn.py
290 291 292 293 294 295 296 297 298 299 300 301 | |
mindcv.models.dpn.dpn92(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 92 layers DPN model.
Refer to the base class models.DPN for more details.
Source code in mindcv\models\dpn.py
262 263 264 265 266 267 268 269 270 271 272 273 | |
mindcv.models.dpn.dpn98(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 98 layers DPN model.
Refer to the base class models.DPN for more details.
Source code in mindcv\models\dpn.py
276 277 278 279 280 281 282 283 284 285 286 287 | |
edgenext¶
mindcv.models.edgenext
¶
MindSpore implementation of edgenext.
Refer to EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications.
mindcv.models.edgenext.EdgeNeXt
¶
Bases: Cell
EdgeNeXt model class, based on
"Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision" <https://arxiv.org/abs/2206.10589>_
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
number of input channels. Default: 3
|
num_classes |
number of classification classes. Default: 1000
DEFAULT:
|
depths |
the depths of each layer. Default: [0, 0, 0, 3]
DEFAULT:
|
dims |
the middle dim of each layer. Default: [24, 48, 88, 168]
DEFAULT:
|
global_block |
number of global block. Default: [0, 0, 0, 3]
DEFAULT:
|
global_block_type |
type of global block. Default: ['None', 'None', 'None', 'SDTA']
DEFAULT:
|
drop_path_rate |
Stochastic Depth. Default: 0.
DEFAULT:
|
layer_scale_init_value |
value of layer scale initialization. Default: 1e-6
DEFAULT:
|
head_init_scale |
scale of head initialization. Default: 1.
DEFAULT:
|
expan_ratio |
ratio of expansion. Default: 4
DEFAULT:
|
kernel_sizes |
kernel sizes of different stages. Default: [7, 7, 7, 7]
DEFAULT:
|
heads |
number of attention heads. Default: [8, 8, 8, 8]
DEFAULT:
|
use_pos_embd_xca |
use position embedding in xca or not. Default: [False, False, False, False]
DEFAULT:
|
use_pos_embd_global |
use position embedding globally or not. Default: False
DEFAULT:
|
d2_scales |
scales of splitting channels
DEFAULT:
|
Source code in mindcv\models\edgenext.py
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 | |
mindcv.models.edgenext.LayerNorm
¶
Bases: LayerNorm
LayerNorm for channels_first tensors with 2d spatial dimensions (ie N, C, H, W).
Source code in mindcv\models\edgenext.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
mindcv.models.edgenext.edgenext_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get edgenext_base model.
Refer to the base class models.EdgeNeXt for more details.
Source code in mindcv\models\edgenext.py
472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 | |
mindcv.models.edgenext.edgenext_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get edgenext_small model.
Refer to the base class models.EdgeNeXt for more details.
Source code in mindcv\models\edgenext.py
450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 | |
mindcv.models.edgenext.edgenext_x_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get edgenext_x_small model.
Refer to the base class models.EdgeNeXt for more details.
Source code in mindcv\models\edgenext.py
427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 | |
mindcv.models.edgenext.edgenext_xx_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get edgenext_xx_small model.
Refer to the base class models.EdgeNeXt for more details.
Source code in mindcv\models\edgenext.py
403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 | |
efficientnet¶
mindcv.models.efficientnet
¶
EfficientNet Architecture.
mindcv.models.efficientnet.EfficientNet
¶
Bases: Cell
EfficientNet architecture.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
arch |
The name of the model.
TYPE:
|
dropout_rate |
The dropout rate of efficientnet.
TYPE:
|
width_mult |
The ratio of the channel. Default: 1.0.
TYPE:
|
depth_mult |
The ratio of num_layers. Default: 1.0.
TYPE:
|
in_channels |
The input channels. Default: 3.
TYPE:
|
num_classes |
The number of class. Default: 1000.
TYPE:
|
inverted_residual_setting |
The settings of block. Default: None.
TYPE:
|
drop_path_prob |
The drop path rate of MBConv. Default: 0.2.
TYPE:
|
norm_layer |
The normalization layer. Default: None.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, 1000).
Source code in mindcv\models\efficientnet.py
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 | |
mindcv.models.efficientnet.EfficientNet.construct(x)
¶
construct
Source code in mindcv\models\efficientnet.py
456 457 458 459 | |
mindcv.models.efficientnet.FusedMBConv
¶
Bases: Cell
FusedMBConv
Source code in mindcv\models\efficientnet.py
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 | |
mindcv.models.efficientnet.FusedMBConvConfig
¶
Bases: MBConvConfig
FusedMBConvConfig
Source code in mindcv\models\efficientnet.py
206 207 208 209 210 211 212 213 214 215 216 217 218 219 | |
mindcv.models.efficientnet.MBConv
¶
Bases: Cell
MBConv Module.
| PARAMETER | DESCRIPTION |
|---|---|
cnf |
The class which contains the parameters(in_channels, out_channels, nums_layers) and the functions which help calculate the parameters after multipling the expand_ratio.
TYPE:
|
drop_path_prob |
The drop path rate in MBConv. Default: 0.2.
TYPE:
|
norm |
The BatchNorm Method. Default: None.
TYPE:
|
se_layer |
The squeeze-excite Module. Default: SqueezeExcite.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor |
Source code in mindcv\models\efficientnet.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | |
mindcv.models.efficientnet.MBConvConfig
¶
The Parameters of MBConv which need to multiply the expand_ration.
| PARAMETER | DESCRIPTION |
|---|---|
expand_ratio |
The Times of the num of out_channels with respect to in_channels.
TYPE:
|
kernel_size |
The kernel size of the depthwise conv.
TYPE:
|
stride |
The stride of the depthwise conv.
TYPE:
|
in_chs |
The input_channels of the MBConv Module.
TYPE:
|
out_chs |
The output_channels of the MBConv Module.
TYPE:
|
num_layers |
The num of MBConv Module.
TYPE:
|
width_cnf |
The ratio of the channel. Default: 1.0.
TYPE:
|
depth_cnf |
The ratio of num_layers. Default: 1.0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Examples:
>>> cnf = MBConvConfig(1, 3, 1, 32, 16, 1)
>>> print(cnf.input_channels)
Source code in mindcv\models\efficientnet.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
mindcv.models.efficientnet.MBConvConfig.adjust_channels(channels, width_cnf, min_value=None)
staticmethod
¶
Calculate the width of MBConv.
| PARAMETER | DESCRIPTION |
|---|---|
channels |
The number of channel.
TYPE:
|
width_cnf |
The ratio of channel.
TYPE:
|
min_value |
The minimum number of channel. Default: None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
int, the width of MBConv. |
Source code in mindcv\models\efficientnet.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
mindcv.models.efficientnet.MBConvConfig.adjust_depth(num_layers, depth_cnf)
staticmethod
¶
Calculate the depth of MBConv.
| PARAMETER | DESCRIPTION |
|---|---|
num_layers |
The number of MBConv Module.
TYPE:
|
depth_cnf |
The ratio of num_layers.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
int, the depth of MBConv. |
Source code in mindcv\models\efficientnet.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
mindcv.models.efficientnet.efficientnet_b0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B0 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 | |
mindcv.models.efficientnet.efficientnet_b1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B1 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 | |
mindcv.models.efficientnet.efficientnet_b2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B2 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 | |
mindcv.models.efficientnet.efficientnet_b3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B3 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 | |
mindcv.models.efficientnet.efficientnet_b4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B4 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 | |
mindcv.models.efficientnet.efficientnet_b5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B5 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 | |
mindcv.models.efficientnet.efficientnet_b6(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B6 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 | |
mindcv.models.efficientnet.efficientnet_b7(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B7 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 | |
mindcv.models.efficientnet.efficientnet_v2_l(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B4 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 | |
mindcv.models.efficientnet.efficientnet_v2_m(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B4 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 | |
mindcv.models.efficientnet.efficientnet_v2_s(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B4 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 | |
mindcv.models.efficientnet.efficientnet_v2_xl(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Constructs a EfficientNet B4 architecture from
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
If True, returns a model pretrained on IMAGENET. Default: False.
TYPE:
|
num_classes |
The numbers of classes. Default: 1000.
TYPE:
|
in_channels |
The input channels. Default: 1000.
TYPE:
|
Inputs
- x (Tensor) - Tensor of shape :math:
(N, C_{in}, H_{in}, W_{in}).
Outputs
Tensor of shape :math:(N, CLASSES_{out}).
Source code in mindcv\models\efficientnet.py
717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 | |
features¶
mindcv.models.features
¶
mindcv.models.features.FeatureExtractWrapper
¶
Bases: Cell
A wrapper of the original model, aims to perform the feature extraction at each stride.
Basically, it performs 3 steps: 1. extract the return node name from the network's property
feature_info; 2. partially flatten the network architecture if network's attribute flatten_sequential
is True; 3. rebuild the forward steps and output the features based on the return node name.
It also provide a property out_channels in the wrapped model, return the number of features at each output
layer. This propery is usually used for the downstream tasks, which requires feature infomation at network
build stage.
It should be note that to apply this wrapper, there is a strong assumption that each of the outmost cell
are registered in the same order as they are used. And there should be no reuse of each cell, even for the ReLU
cell. Otherwise, the returned result may not be correct.
And it should be also note that it basically rebuild the model. So the default checkpoint parameter cannot be loaded correctly once that model is wrapped. To use the pretrained weight, please load the weight first and then use this wrapper to rebuild the model.
| PARAMETER | DESCRIPTION |
|---|---|
net |
The model need to be wrapped.
TYPE:
|
out_indices |
The indicies of the output features. Default: [0, 1, 2, 3, 4]
TYPE:
|
Source code in mindcv\models\features.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
mindcv.models.features.FeatureExtractWrapper.out_channels
property
¶
The output channels of the model, filtered by the out_indices.
ghostnet¶
mindcv.models.ghostnet
¶
MindSpore implementation of GhostNet.
Refer to GhostNet: More Features from Cheap Operations.
mindcv.models.ghostnet.GhostNet
¶
Bases: Cell
GhostNet model class, based on
"GhostNet: More Features from Cheap Operations " <https://arxiv.org/abs/1911.11907>_.
Args:
num_classes: number of classification classes. Default: 1000.
width: base width of hidden channel in blocks. Default: 1.0.
in_channels: number of input channels. Default: 3.
drop_rate: the probability of the features before classification. Default: 0.2.
Source code in mindcv\models\ghostnet.py
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 | |
mindcv.models.ghostnet.HardSigmoid
¶
Bases: Cell
Implementation for (relu6 + 3) / 6
Source code in mindcv\models\ghostnet.py
41 42 43 44 45 46 47 48 49 | |
mindcv.models.ghostnet.ghostnet_050(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
GhostNet-0.5x
Source code in mindcv\models\ghostnet.py
298 299 300 301 302 303 304 305 306 307 | |
mindcv.models.ghostnet.ghostnet_100(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
GhostNet-1.0x
Source code in mindcv\models\ghostnet.py
310 311 312 313 314 315 316 317 318 319 | |
mindcv.models.ghostnet.ghostnet_130(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
GhostNet-1.3x
Source code in mindcv\models\ghostnet.py
322 323 324 325 326 327 328 329 330 331 | |
halonet¶
mindcv.models.halonet
¶
MindSpore implementation of HaloNet.
Refer to Scaling Local Self-Attention for Parameter Effificient Visual Backbones.
mindcv.models.halonet.ActLayer
¶
Bases: Cell
Build Activation Layer according to act type
Source code in mindcv\models\halonet.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
mindcv.models.halonet.BatchNormAct2d
¶
Bases: Cell
Build layer contain: bn-act
Source code in mindcv\models\halonet.py
87 88 89 90 91 92 93 94 95 96 97 98 | |
mindcv.models.halonet.BottleneckBlock
¶
Bases: Cell
ResNet-like Bottleneck Block - 1x1 - kxk - 1x1
Source code in mindcv\models\halonet.py
324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 | |
mindcv.models.halonet.ConvBnAct
¶
Bases: Cell
Build layer contain: conv - bn - act
Source code in mindcv\models\halonet.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
mindcv.models.halonet.HaloAttention
¶
Bases: Cell
The internal dimensions of the attention module are controlled by the interaction of several arguments. the output dimension : dim_out the value(v) dimension : dim_out//num_heads the query(q) and key(k) dimensions are determined by : * num_heads*dim_head * num_heads*(dim_out*attn_ratio//num_heads) the ratio of q and k relative to the output : attn_ratio
| PARAMETER | DESCRIPTION |
|---|---|
dim |
input dimension to the module
TYPE:
|
dim_out |
output dimension of the module, same as dim if not set
TYPE:
|
feat_size |
size of input feature_map (not used, for arg compat with bottle/lambda)
TYPE:
|
stride |
output stride of the module, query downscaled if > 1 (default: 1).
DEFAULT:
|
num_heads |
parallel attention heads (default: 8).
DEFAULT:
|
dim_head |
dimension of query and key heads, calculated from dim_out * attn_ratio // num_heads if not set
DEFAULT:
|
block_size |
size of blocks. (default: 8)
TYPE:
|
halo_size |
size of halo overlap. (default: 3)
TYPE:
|
qk_ratio |
ratio of q and k dimensions to output dimension when dim_head not set. (default: 1.0)
TYPE:
|
qkv_bias |
add bias to q, k, and v projections
TYPE:
|
avg_down |
use average pool downsample instead of strided query blocks
TYPE:
|
scale_pos_embed |
scale the position embedding as well as Q @ K
TYPE:
|
Source code in mindcv\models\halonet.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 | |
mindcv.models.halonet.HaloNet
¶
Bases: Cell
Define main structure of HaloNet: stem - blocks - head
Source code in mindcv\models\halonet.py
528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 | |
mindcv.models.halonet.HaloStage
¶
Bases: Cell
Stage layers for HaloNet. Stage layers contains a number of Blocks.
Source code in mindcv\models\halonet.py
457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 | |
mindcv.models.halonet.RelPosEmb
¶
Bases: Cell
Relative Position Embedding
Source code in mindcv\models\halonet.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 | |
mindcv.models.halonet.RelPosEmb.__init__(block_size, win_size, dim_head)
¶
:param block_size (int): block size :param win_size (int): neighbourhood window size :param dim_head (int): attention head dim :param scale (float): scale factor (for init)
Source code in mindcv\models\halonet.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
mindcv.models.halonet.SelectAdaptivePool2d
¶
Bases: Cell
Selectable global pooling layer with dynamic input kernel size
Source code in mindcv\models\halonet.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
mindcv.models.halonet.SelfAttnBlock
¶
Bases: Cell
ResNet-like Bottleneck Block - 1x1 -kxk - self attn -1x1
Source code in mindcv\models\halonet.py
397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 | |
mindcv.models.halonet.halonet_50t(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get HaloNet model.
Refer to the base class models.HaloNet for more details.
Source code in mindcv\models\halonet.py
628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 | |
mindcv.models.halonet.rel_logits_1d(q, rel_k, permute_mask)
¶
Compute relative logits along one dimension :param q: [batch,H,W,dim] :param rel_k: [2*window-1,dim] :param permute_mask: permute output axis according to this
Source code in mindcv\models\halonet.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | |
hrnet¶
mindcv.models.hrnet
¶
MindSpore implementation of HRNet.
Refer to Deep High-Resolution Representation Learning for Visual Recognition
mindcv.models.hrnet.BasicBlock
¶
Bases: Cell
Basic block of HRNet
Source code in mindcv\models\hrnet.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
mindcv.models.hrnet.Bottleneck
¶
Bases: Cell
Bottleneck block of HRNet
Source code in mindcv\models\hrnet.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
mindcv.models.hrnet.HRModule
¶
Bases: Cell
High-Resolution Module for HRNet. In this module, every branch has 4 BasicBlocks/Bottlenecks. Fusion/Exchange is in this module.
Source code in mindcv\models\hrnet.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 | |
mindcv.models.hrnet.HRNet
¶
Bases: Cell
HRNet Backbone, based on
"Deep High-Resolution Representation Learning for Visual Recognition"
<https://arxiv.org/abs/1908.07919>_.
| PARAMETER | DESCRIPTION |
|---|---|
stage_cfg |
Configuration of the extra blocks. It accepts a dictionay
storing the detail config of each block. which include
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
Number the channels of the input. Default: 3.
TYPE:
|
Source code in mindcv\models\hrnet.py
361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 | |
mindcv.models.hrnet.HRNet.forward_features(x)
¶
Perform the feature extraction.
| PARAMETER | DESCRIPTION |
|---|---|
x |
Tensor
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[Tensor]
|
Extracted feature |
Source code in mindcv\models\hrnet.py
624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 | |
mindcv.models.hrnet.HRNetFeatures
¶
Bases: HRNet
The feature extraction version of HRNet
Source code in mindcv\models\hrnet.py
688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 | |
mindcv.models.hrnet.IdentityCell
¶
Bases: Cell
Identity Cell
Source code in mindcv\models\hrnet.py
35 36 37 38 39 40 41 42 | |
mindcv.models.hrnet.hrnet_w32(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get HRNet with width=32 model.
Refer to the base class models.HRNet for more details.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
Whether the model is pretrained. Default: False
TYPE:
|
num_classes |
number of classification classes. Default: 1000
TYPE:
|
in_channels |
Number of input channels. Default: 3
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Union[HRNet, HRNetFeatures]
|
HRNet model |
Source code in mindcv\models\hrnet.py
760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 | |
mindcv.models.hrnet.hrnet_w48(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get HRNet with width=48 model.
Refer to the base class models.HRNet for more details.
| PARAMETER | DESCRIPTION |
|---|---|
pretrained |
Whether the model is pretrained. Default: False
TYPE:
|
num_classes |
number of classification classes. Default: 1000
TYPE:
|
in_channels |
Number of input channels. Default: 3
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Union[HRNet, HRNetFeatures]
|
HRNet model |
Source code in mindcv\models\hrnet.py
810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 | |
inceptionv3¶
mindcv.models.inceptionv3
¶
MindSpore implementation of InceptionV3.
Refer to Rethinking the Inception Architecture for Computer Vision.
mindcv.models.inceptionv3.BasicConv2d
¶
Bases: Cell
A block for conv bn and relu
Source code in mindcv\models\inceptionv3.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
mindcv.models.inceptionv3.InceptionAux
¶
Bases: Cell
Inception module for the aux classifier head
Source code in mindcv\models\inceptionv3.py
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
mindcv.models.inceptionv3.InceptionV3
¶
Bases: Cell
Inception v3 model architecture from
"Rethinking the Inception Architecture for Computer Vision" <https://arxiv.org/abs/1512.00567>_.
.. note:: Important: In contrast to the other models the inception_v3 expects tensors with a size of N x 3 x 299 x 299, so ensure your images are sized accordingly.
| PARAMETER | DESCRIPTION |
|---|---|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
aux_logits |
use auxiliary classifier or not. Default: False.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
drop_rate |
dropout rate of the layer before main classifier. Default: 0.2.
TYPE:
|
Source code in mindcv\models\inceptionv3.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 | |
mindcv.models.inceptionv3.inception_v3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get InceptionV3 model.
Refer to the base class models.InceptionV3 for more details.
Source code in mindcv\models\inceptionv3.py
328 329 330 331 332 333 334 335 336 337 338 | |
inceptionv4¶
mindcv.models.inceptionv4
¶
MindSpore implementation of InceptionV4.
Refer to Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.
mindcv.models.inceptionv4.BasicConv2d
¶
Bases: Cell
A block for conv bn and relu
Source code in mindcv\models\inceptionv4.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
mindcv.models.inceptionv4.InceptionA
¶
Bases: Cell
Inception V4 model basic architecture
Source code in mindcv\models\inceptionv4.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
mindcv.models.inceptionv4.InceptionB
¶
Bases: Cell
Inception V4 model basic architecture
Source code in mindcv\models\inceptionv4.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
mindcv.models.inceptionv4.InceptionC
¶
Bases: Cell
Inception V4 model basic architecture
Source code in mindcv\models\inceptionv4.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 | |
mindcv.models.inceptionv4.InceptionV4
¶
Bases: Cell
Inception v4 model architecture from
"Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning" <https://arxiv.org/abs/1602.07261>_. # noqa: E501
| PARAMETER | DESCRIPTION |
|---|---|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
drop_rate |
dropout rate of the layer before main classifier. Default: 0.2.
TYPE:
|
Source code in mindcv\models\inceptionv4.py
253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 | |
mindcv.models.inceptionv4.ReductionA
¶
Bases: Cell
Inception V4 model Residual Connections
Source code in mindcv\models\inceptionv4.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
mindcv.models.inceptionv4.ReductionB
¶
Bases: Cell
Inception V4 model Residual Connections
Source code in mindcv\models\inceptionv4.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 | |
mindcv.models.inceptionv4.Stem
¶
Bases: Cell
Inception V4 model blocks.
Source code in mindcv\models\inceptionv4.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
mindcv.models.inceptionv4.inception_v4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get InceptionV4 model.
Refer to the base class models.InceptionV4 for more details.
Source code in mindcv\models\inceptionv4.py
310 311 312 313 314 315 316 317 318 319 320 | |
mae¶
mindcv.models.mae
¶
mindcv.models.mae.MAEForPretrain
¶
Bases: Cell
Source code in mindcv\models\mae.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 | |
mindcv.models.mae.MAEForPretrain.patchify(imgs)
¶
Source code in mindcv\models\mae.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 | |
mindcv.models.mae.MAEForPretrain.unpatchify(x)
¶
Source code in mindcv\models\mae.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
mindcv.models.mae.get_1d_sincos_pos_embed_from_grid(embed_dim, pos)
¶
Source code in mindcv\models\mae.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
mindcv.models.mae.get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False)
¶
Source code in mindcv\models\mae.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
mixnet¶
mindcv.models.mixnet
¶
MindSpore implementation of MixNet.
Refer to MixConv: Mixed Depthwise Convolutional Kernels
mindcv.models.mixnet.MDConv
¶
Bases: Cell
Mixed Depth-wise Convolution
Source code in mindcv\models\mixnet.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
mindcv.models.mixnet.MixNet
¶
Bases: Cell
MixNet model class, based on
"MixConv: Mixed Depthwise Convolutional Kernels" <https://arxiv.org/abs/1907.09595>_
| PARAMETER | DESCRIPTION |
|---|---|
arch |
size of the architecture. "small", "medium" or "large". Default: "small".
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number of the channels of the input. Default: 3.
TYPE:
|
feature_size |
numbet of the channels of the output features. Default: 1536.
TYPE:
|
drop_rate |
rate of dropout for classifier. Default: 0.2.
TYPE:
|
depth_multiplier |
expansion coefficient of channels. Default: 1.0.
TYPE:
|
Source code in mindcv\models\mixnet.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 | |
mindcv.models.mixnet.MixNetBlock
¶
Bases: Cell
Basic Block of MixNet
Source code in mindcv\models\mixnet.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 | |
mlpmixer¶
mindcv.models.mlpmixer
¶
MindSpore implementation of MLP-Mixer.
Refer to MLP-Mixer: An all-MLP Architecture for Vision.
mindcv.models.mlpmixer.FeedForward
¶
Bases: Cell
Feed Forward Block. MLP Layer. FC -> GELU -> FC
Source code in mindcv\models\mlpmixer.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
mindcv.models.mlpmixer.MLPMixer
¶
Bases: Cell
MLP-Mixer model class, based on
"MLP-Mixer: An all-MLP Architecture for Vision" <https://arxiv.org/abs/2105.01601>_
| PARAMETER | DESCRIPTION |
|---|---|
depth |
number of MixerBlocks.
TYPE:
|
patch_size |
size of a single image patch.
TYPE:
|
n_patches |
number of patches.
TYPE:
|
n_channels |
channels(dimension) of a single embedded patch.
TYPE:
|
token_dim |
hidden dim of token-mixing MLP.
TYPE:
|
channel_dim |
hidden dim of channel-mixing MLP.
TYPE:
|
num_classes |
number of classification classes.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
DEFAULT:
|
Source code in mindcv\models\mlpmixer.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | |
mindcv.models.mlpmixer.MixerBlock
¶
Bases: Cell
Mixer Layer with token-mixing MLP and channel-mixing MLP
Source code in mindcv\models\mlpmixer.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
mindcv.models.mlpmixer.TransPose
¶
Bases: Cell
TransPose Layer. Wrap operator Transpose for easy integration in nn.SequentialCell
Source code in mindcv\models\mlpmixer.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
mnasnet¶
mindcv.models.mnasnet
¶
MindSpore implementation of MnasNet.
Refer to MnasNet: Platform-Aware Neural Architecture Search for Mobile.
mindcv.models.mnasnet.Mnasnet
¶
Bases: Cell
MnasNet model architecture from
"MnasNet: Platform-Aware Neural Architecture Search for Mobile" <https://arxiv.org/abs/1807.11626>_.
| PARAMETER | DESCRIPTION |
|---|---|
alpha |
scale factor of model width.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
drop_rate |
dropout rate of the layer before main classifier. Default: 0.2.
TYPE:
|
Source code in mindcv\models\mnasnet.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | |
mindcv.models.mnasnet.mnasnet_050(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MnasNet model with width scaled by 0.5.
Refer to the base class models.Mnasnet for more details.
Source code in mindcv\models\mnasnet.py
180 181 182 183 184 185 186 187 188 189 190 | |
mindcv.models.mnasnet.mnasnet_075(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MnasNet model with width scaled by 0.75.
Refer to the base class models.Mnasnet for more details.
Source code in mindcv\models\mnasnet.py
193 194 195 196 197 198 199 200 201 202 203 | |
mindcv.models.mnasnet.mnasnet_100(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MnasNet model with width scaled by 1.0.
Refer to the base class models.Mnasnet for more details.
Source code in mindcv\models\mnasnet.py
206 207 208 209 210 211 212 213 214 215 216 | |
mindcv.models.mnasnet.mnasnet_130(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MnasNet model with width scaled by 1.3.
Refer to the base class models.Mnasnet for more details.
Source code in mindcv\models\mnasnet.py
219 220 221 222 223 224 225 226 227 228 229 | |
mindcv.models.mnasnet.mnasnet_140(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MnasNet model with width scaled by 1.4.
Refer to the base class models.Mnasnet for more details.
Source code in mindcv\models\mnasnet.py
232 233 234 235 236 237 238 239 240 241 242 | |
mobilenetv1¶
mindcv.models.mobilenetv1
¶
MindSpore implementation of MobileNetV1.
Refer to MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
mindcv.models.mobilenetv1.MobileNetV1
¶
Bases: Cell
MobileNetV1 model class, based on
"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" <https://arxiv.org/abs/1704.04861>_ # noqa: E501
| PARAMETER | DESCRIPTION |
|---|---|
alpha |
scale factor of model width. Default: 1.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
Source code in mindcv\models\mobilenetv1.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
mindcv.models.mobilenetv1.mobilenet_v1_025(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV1 model with width scaled by 0.25.
Refer to the base class models.MobileNetV1 for more details.
Source code in mindcv\models\mobilenetv1.py
137 138 139 140 141 142 143 144 145 146 147 148 | |
mindcv.models.mobilenetv1.mobilenet_v1_050(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV1 model with width scaled by 0.5.
Refer to the base class models.MobileNetV1 for more details.
Source code in mindcv\models\mobilenetv1.py
151 152 153 154 155 156 157 158 159 160 161 162 | |
mindcv.models.mobilenetv1.mobilenet_v1_075(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV1 model with width scaled by 0.75.
Refer to the base class models.MobileNetV1 for more details.
Source code in mindcv\models\mobilenetv1.py
165 166 167 168 169 170 171 172 173 174 175 176 | |
mindcv.models.mobilenetv1.mobilenet_v1_100(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV1 model without width scaling.
Refer to the base class models.MobileNetV1 for more details.
Source code in mindcv\models\mobilenetv1.py
179 180 181 182 183 184 185 186 187 188 189 190 | |
mobilenetv2¶
mindcv.models.mobilenetv2
¶
MindSpore implementation of MobileNetV2.
Refer to MobileNetV2: Inverted Residuals and Linear Bottlenecks.
mindcv.models.mobilenetv2.InvertedResidual
¶
Bases: Cell
Inverted Residual Block of MobileNetV2
Source code in mindcv\models\mobilenetv2.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
mindcv.models.mobilenetv2.MobileNetV2
¶
Bases: Cell
MobileNetV2 model class, based on
"MobileNetV2: Inverted Residuals and Linear Bottlenecks" <https://arxiv.org/abs/1801.04381>_
| PARAMETER | DESCRIPTION |
|---|---|
alpha |
scale factor of model width. Default: 1.
TYPE:
|
round_nearest |
divisor of make divisible function. Default: 8.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
Source code in mindcv\models\mobilenetv2.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 | |
mindcv.models.mobilenetv2.mobilenet_v2_035_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.35 and input image size of 128.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
482 483 484 485 486 487 488 489 | |
mindcv.models.mobilenetv2.mobilenet_v2_035_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.35 and input image size of 160.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
472 473 474 475 476 477 478 479 | |
mindcv.models.mobilenetv2.mobilenet_v2_035_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.35 and input image size of 192.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
462 463 464 465 466 467 468 469 | |
mindcv.models.mobilenetv2.mobilenet_v2_035_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.35 and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
452 453 454 455 456 457 458 459 | |
mindcv.models.mobilenetv2.mobilenet_v2_035_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.35 and input image size of 96.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
492 493 494 495 496 497 498 499 | |
mindcv.models.mobilenetv2.mobilenet_v2_050_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.5 and input image size of 128.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
432 433 434 435 436 437 438 439 | |
mindcv.models.mobilenetv2.mobilenet_v2_050_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.5 and input image size of 160.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
422 423 424 425 426 427 428 429 | |
mindcv.models.mobilenetv2.mobilenet_v2_050_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.5 and input image size of 192.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
412 413 414 415 416 417 418 419 | |
mindcv.models.mobilenetv2.mobilenet_v2_050_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.5 and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
402 403 404 405 406 407 408 409 | |
mindcv.models.mobilenetv2.mobilenet_v2_050_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.5 and input image size of 96.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
442 443 444 445 446 447 448 449 | |
mindcv.models.mobilenetv2.mobilenet_v2_075(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.75 and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
352 353 354 355 356 357 358 359 | |
mindcv.models.mobilenetv2.mobilenet_v2_075_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.75 and input image size of 128.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
382 383 384 385 386 387 388 389 | |
mindcv.models.mobilenetv2.mobilenet_v2_075_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.75 and input image size of 160.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
372 373 374 375 376 377 378 379 | |
mindcv.models.mobilenetv2.mobilenet_v2_075_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.75 and input image size of 192.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
362 363 364 365 366 367 368 369 | |
mindcv.models.mobilenetv2.mobilenet_v2_075_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 0.75 and input image size of 96.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
392 393 394 395 396 397 398 399 | |
mindcv.models.mobilenetv2.mobilenet_v2_100(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model without width scaling and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
302 303 304 305 306 307 308 309 | |
mindcv.models.mobilenetv2.mobilenet_v2_100_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model without width scaling and input image size of 128.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
332 333 334 335 336 337 338 339 | |
mindcv.models.mobilenetv2.mobilenet_v2_100_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model without width scaling and input image size of 160.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
322 323 324 325 326 327 328 329 | |
mindcv.models.mobilenetv2.mobilenet_v2_100_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model without width scaling and input image size of 192.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
312 313 314 315 316 317 318 319 | |
mindcv.models.mobilenetv2.mobilenet_v2_100_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model without width scaling and input image size of 96.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
342 343 344 345 346 347 348 349 | |
mindcv.models.mobilenetv2.mobilenet_v2_130_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 1.3 and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
292 293 294 295 296 297 298 299 | |
mindcv.models.mobilenetv2.mobilenet_v2_140(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get MobileNetV2 model with width scaled by 1.4 and input image size of 224.
Refer to the base class models.MobileNetV2 for more details.
Source code in mindcv\models\mobilenetv2.py
282 283 284 285 286 287 288 289 | |
mobilenetv3¶
mindcv.models.mobilenetv3
¶
MindSpore implementation of MobileNetV3.
Refer to Searching for MobileNetV3.
mindcv.models.mobilenetv3.Bottleneck
¶
Bases: Cell
Bottleneck Block of MobilenetV3. depth-wise separable convolutions + inverted residual + squeeze excitation
Source code in mindcv\models\mobilenetv3.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
mindcv.models.mobilenetv3.MobileNetV3
¶
Bases: Cell
MobileNetV3 model class, based on
"Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>_
| PARAMETER | DESCRIPTION |
|---|---|
arch |
size of the architecture. 'small' or 'large'.
TYPE:
|
alpha |
scale factor of model width. Default: 1.
TYPE:
|
round_nearest |
divisor of make divisible function. Default: 8.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
Source code in mindcv\models\mobilenetv3.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
mindcv.models.mobilenetv3.mobilenet_v3_large_075(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get large MobileNetV3 model with width scaled by 0.75.
Refer to the base class models.MobileNetV3 for more details.
Source code in mindcv\models\mobilenetv3.py
279 280 281 282 283 284 285 286 | |
mindcv.models.mobilenetv3.mobilenet_v3_large_100(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get large MobileNetV3 model without width scaling.
Refer to the base class models.MobileNetV3 for more details.
Source code in mindcv\models\mobilenetv3.py
259 260 261 262 263 264 265 266 | |
mindcv.models.mobilenetv3.mobilenet_v3_small_075(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get small MobileNetV3 model with width scaled by 0.75.
Refer to the base class models.MobileNetV3 for more details.
Source code in mindcv\models\mobilenetv3.py
269 270 271 272 273 274 275 276 | |
mindcv.models.mobilenetv3.mobilenet_v3_small_100(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get small MobileNetV3 model without width scaling.
Refer to the base class models.MobileNetV3 for more details.
Source code in mindcv\models\mobilenetv3.py
249 250 251 252 253 254 255 256 | |
mobilevit¶
mindcv.models.mobilevit
¶
MindSpore implementation of MobileViT.
Refer to MobileViT:Light-weight, General-purpose, and Mobile-friendly Vision Transformer.
mindcv.models.mobilevit.ConvLayer
¶
Bases: Cell
Conv2d + BN + Act
Source code in mindcv\models\mobilevit.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | |
mindcv.models.mobilevit.InvertedResidual
¶
Bases: Cell
This class implements the inverted residual block, as described in
MobileNetv2 <https://arxiv.org/abs/1801.04381>_ paper
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
:math:
TYPE:
|
out_channels |
:math:
TYPE:
|
stride |
Use convolutions with a stride. Default: 1
TYPE:
|
expand_ratio |
Expand the input channels by this factor in depth-wise conv
TYPE:
|
skip_connection |
Use skip-connection. Default: True
TYPE:
|
Shape
- Input: :math:
(N, C_{in}, H_{in}, W_{in}) - Output: :math:
(N, C_{out}, H_{out}, W_{out})
.. note::
If in_channels =! out_channels and stride > 1, we set skip_connection=False
Source code in mindcv\models\mobilevit.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
mindcv.models.mobilevit.MobileViT
¶
Bases: Cell
This class implements the MobileViT architecture <https://arxiv.org/abs/2110.02178?context=cs.LG>_
Source code in mindcv\models\mobilevit.py
500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 | |
mindcv.models.mobilevit.MobileViTBlock
¶
Bases: Cell
This class defines the MobileViT block <https://arxiv.org/abs/2110.02178?context=cs.LG>_
| PARAMETER | DESCRIPTION |
|---|---|
opts |
command line arguments
|
in_channels |
:math:
TYPE:
|
transformer_dim |
Input dimension to the transformer unit
TYPE:
|
ffn_dim |
Dimension of the FFN block
TYPE:
|
n_transformer_blocks |
Number of transformer blocks. Default: 2
TYPE:
|
head_dim |
Head dimension in the multi-head attention. Default: 32
TYPE:
|
attn_dropout |
Dropout in multi-head attention. Default: 0.0
TYPE:
|
dropout |
Dropout rate. Default: 0.0
TYPE:
|
ffn_dropout |
Dropout between FFN layers in transformer. Default: 0.0
TYPE:
|
patch_h |
Patch height for unfolding operation. Default: 8
TYPE:
|
patch_w |
Patch width for unfolding operation. Default: 8
TYPE:
|
transformer_norm_layer |
Normalization layer in the transformer block. Default: layer_norm
TYPE:
|
conv_ksize |
Kernel size to learn local representations in MobileViT block. Default: 3
TYPE:
|
no_fusion |
Do not combine the input and output feature maps. Default: False
TYPE:
|
Source code in mindcv\models\mobilevit.py
310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 | |
mindcv.models.mobilevit.MultiHeadAttention
¶
Bases: Cell
This layer applies a multi-head self- or cross-attention as described in
Attention is all you need <https://arxiv.org/abs/1706.03762>_ paper
| PARAMETER | DESCRIPTION |
|---|---|
embed_dim |
:math:
TYPE:
|
num_heads |
Number of heads in multi-head attention
TYPE:
|
attn_dropout |
Attention dropout. Default: 0.0
TYPE:
|
bias |
Use bias or not. Default:
TYPE:
|
Shape
- Input: :math:
(N, P, C_{in})where :math:Nis batch size, :math:Pis number of patches, and :math:C_{in}is input embedding dim - Output: same shape as the input
Source code in mindcv\models\mobilevit.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
mindcv.models.mobilevit.TransformerEncoder
¶
Bases: Cell
This class defines the pre-norm Transformer encoder <https://arxiv.org/abs/1706.03762>_
Args:
embed_dim (int): :math:C_{in} from an expected input of size :math:(N, P, C_{in})
ffn_latent_dim (int): Inner dimension of the FFN
num_heads (int) : Number of heads in multi-head attention. Default: 8
attn_dropout (float): Dropout rate for attention in multi-head attention. Default: 0.0
dropout (float): Dropout rate. Default: 0.0
ffn_dropout (float): Dropout between FFN layers. Default: 0.0
Shape
- Input: :math:
(N, P, C_{in})where :math:Nis batch size, :math:Pis number of patches, and :math:C_{in}is input embedding dim - Output: same shape as the input
Source code in mindcv\models\mobilevit.py
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 | |
nasnet¶
mindcv.models.nasnet
¶
MindSpore implementation of NasNet.
Refer to: Learning Transferable Architectures for Scalable Image Recognition
mindcv.models.nasnet.BranchSeparables
¶
Bases: Cell
NasNet model basic architecture
Source code in mindcv\models\nasnet.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | |
mindcv.models.nasnet.BranchSeparablesReduction
¶
Bases: BranchSeparables
NasNet model Residual Connections
Source code in mindcv\models\nasnet.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | |
mindcv.models.nasnet.BranchSeparablesStem
¶
Bases: Cell
NasNet model basic architecture
Source code in mindcv\models\nasnet.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
mindcv.models.nasnet.CellStem0
¶
Bases: Cell
NasNet model basic architecture
Source code in mindcv\models\nasnet.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 | |
mindcv.models.nasnet.CellStem1
¶
Bases: Cell
NasNet model basic architecture
Source code in mindcv\models\nasnet.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 | |
mindcv.models.nasnet.FirstCell
¶
Bases: Cell
NasNet model basic architecture
Source code in mindcv\models\nasnet.py
347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 | |
mindcv.models.nasnet.NASNetAMobile
¶
Bases: Cell
NasNet model class, based on
"Learning Transferable Architectures for Scalable Image Recognition" <https://arxiv.org/pdf/1707.07012v4.pdf>_
Args:
num_classes: number of classification classes.
stem_filters: number of stem filters. Default: 32.
penultimate_filters: number of penultimate filters. Default: 1056.
filters_multiplier: size of filters multiplier. Default: 2.
Source code in mindcv\models\nasnet.py
681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 | |
mindcv.models.nasnet.NASNetAMobile.forward_features(x)
¶
Network forward feature extraction.
Source code in mindcv\models\nasnet.py
834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 | |
mindcv.models.nasnet.NormalCell
¶
Bases: Cell
NasNet model basic architecture
Source code in mindcv\models\nasnet.py
439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 | |
mindcv.models.nasnet.ReductionCell0
¶
Bases: Cell
NasNet model Residual Connections
Source code in mindcv\models\nasnet.py
508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 | |
mindcv.models.nasnet.ReductionCell1
¶
Bases: Cell
NasNet model Residual Connections
Source code in mindcv\models\nasnet.py
582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 | |
mindcv.models.nasnet.SeparableConv2d
¶
Bases: Cell
depth-wise convolutions + point-wise convolutions
Source code in mindcv\models\nasnet.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
mindcv.models.nasnet.nasnet_a_4x1056(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get NasNet model.
Refer to the base class models.NASNetAMobile for more details.
Source code in mindcv\models\nasnet.py
874 875 876 877 878 879 880 881 882 | |
pit¶
mindcv.models.pit
¶
MindSpore implementation of PiT.
Refer to Rethinking Spatial Dimensions of Vision Transformers.
mindcv.models.pit.Attention
¶
Bases: Cell
define multi-head self attention block
Source code in mindcv\models\pit.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
mindcv.models.pit.Block
¶
Bases: Cell
define the basic block of PiT
Source code in mindcv\models\pit.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | |
mindcv.models.pit.Mlp
¶
Bases: Cell
MLP as used in Vision Transformer, MLP-Mixer and related networks
Source code in mindcv\models\pit.py
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
mindcv.models.pit.PoolingTransformer
¶
Bases: Cell
PiT model class, based on
"Rethinking Spatial Dimensions of Vision Transformers"
<https://arxiv.org/abs/2103.16302>
Args:
image_size (int) : images input size.
patch_size (int) : image patch size.
stride (int) : stride of the depthwise conv.
base_dims (List[int]) : middle dim of each layer.
depth (List[int]) : model block depth of each layer.
heads (List[int]) : number of heads of multi-head attention of each layer
mlp_ratio (float) : ratio of hidden features in Mlp.
num_classes (int) : number of classification classes. Default: 1000.
in_chans (int) : number the channels of the input. Default: 3.
attn_drop_rate (float) : attention layers dropout rate. Default: 0.
drop_rate (float) : dropout rate. Default: 0.
drop_path_rate (float) : drop path rate. Default: 0.
Source code in mindcv\models\pit.py
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 | |
mindcv.models.pit.Transformer
¶
Bases: Cell
define the transformer block of PiT
Source code in mindcv\models\pit.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 | |
mindcv.models.pit.conv_embedding
¶
Bases: Cell
define embedding layer using conv2d
Source code in mindcv\models\pit.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
mindcv.models.pit.conv_head_pooling
¶
Bases: Cell
define pooling layer using conv in spatial tokens with an additional fully-connected layer (to adjust the channel size to match the spatial tokens)
Source code in mindcv\models\pit.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
mindcv.models.pit.pit_b(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PiT-B model.
Refer to the base class models.PoolingTransformer for more details.
Source code in mindcv\models\pit.py
475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 | |
mindcv.models.pit.pit_s(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PiT-S model.
Refer to the base class models.PoolingTransformer for more details.
Source code in mindcv\models\pit.py
451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 | |
mindcv.models.pit.pit_ti(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PiT-Ti model.
Refer to the base class models.PoolingTransformer for more details.
Source code in mindcv\models\pit.py
403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 | |
mindcv.models.pit.pit_xs(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PiT-XS model.
Refer to the base class models.PoolingTransformer for more details.
Source code in mindcv\models\pit.py
427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 | |
poolformer¶
mindcv.models.poolformer
¶
MindSpore implementation of poolformer.
Refer to PoolFormer: MetaFormer Is Actually What You Need for Vision.
mindcv.models.poolformer.ConvMlp
¶
Bases: Cell
MLP using 1x1 convs that keeps spatial dims
Source code in mindcv\models\poolformer.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | |
mindcv.models.poolformer.ConvMlp.cls_init_weights()
¶
Initialize weights for cells.
Source code in mindcv\models\poolformer.py
88 89 90 91 92 93 94 95 96 | |
mindcv.models.poolformer.PatchEmbed
¶
Bases: Cell
Patch Embedding that is implemented by a layer of conv. Input: tensor in shape [B, C, H, W] Output: tensor in shape [B, C, H/stride, W/stride]
Source code in mindcv\models\poolformer.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
mindcv.models.poolformer.PoolFormer
¶
Bases: Cell
PoolFormer model class, based on
"MetaFormer Is Actually What You Need for Vision" <https://arxiv.org/pdf/2111.11418v3.pdf>_
| PARAMETER | DESCRIPTION |
|---|---|
layers |
number of blocks for the 4 stages
|
embed_dims |
the embedding dims for the 4 stages. Default: (64, 128, 320, 512)
DEFAULT:
|
mlp_ratios |
mlp ratios for the 4 stages. Default: (4, 4, 4, 4)
DEFAULT:
|
downsamples |
flags to apply downsampling or not. Default: (True, True, True, True)
DEFAULT:
|
pool_size |
the pooling size for the 4 stages. Default: 3
DEFAULT:
|
in_chans |
number of input channels. Default: 3
DEFAULT:
|
num_classes |
number of classes for the image classification. Default: 1000
DEFAULT:
|
global_pool |
define the types of pooling layer. Default: avg
DEFAULT:
|
norm_layer |
define the types of normalization. Default: nn.GroupNorm
DEFAULT:
|
act_layer |
define the types of activation. Default: nn.GELU
DEFAULT:
|
in_patch_size |
specify the patch embedding for the input image. Default: 7
DEFAULT:
|
in_stride |
specify the stride for the input image. Default: 4.
DEFAULT:
|
in_pad |
specify the pad for the input image. Default: 2.
DEFAULT:
|
down_patch_size |
specify the downsample. Default: 3.
DEFAULT:
|
down_stride |
specify the downsample (patch embed.). Default: 2.
DEFAULT:
|
down_pad |
specify the downsample (patch embed.). Default: 1.
DEFAULT:
|
drop_rate |
dropout rate of the layer before main classifier. Default: 0.
DEFAULT:
|
drop_path_rate |
Stochastic Depth. Default: 0.
DEFAULT:
|
layer_scale_init_value |
LayerScale. Default: 1e-5.
DEFAULT:
|
fork_feat |
whether output features of the 4 stages, for dense prediction. Default: False.
DEFAULT:
|
Source code in mindcv\models\poolformer.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 | |
mindcv.models.poolformer.PoolFormer.cls_init_weights()
¶
Initialize weights for cells.
Source code in mindcv\models\poolformer.py
291 292 293 294 295 296 297 298 299 | |
mindcv.models.poolformer.PoolFormerBlock
¶
Bases: Cell
Implementation of one PoolFormer block.
Source code in mindcv\models\poolformer.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
mindcv.models.poolformer.basic_blocks(dim, index, layers, pool_size=3, mlp_ratio=4.0, act_layer=nn.GELU, norm_layer=nn.GroupNorm, drop_rate=0.0, drop_path_rate=0.0, layer_scale_init_value=1e-05)
¶
generate PoolFormer blocks for a stage
Source code in mindcv\models\poolformer.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 | |
mindcv.models.poolformer.poolformer_m36(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get poolformer_m36 model.
Refer to the base class models.PoolFormer for more details.
Source code in mindcv\models\poolformer.py
359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 | |
mindcv.models.poolformer.poolformer_m48(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get poolformer_m48 model.
Refer to the base class models.PoolFormer for more details.
Source code in mindcv\models\poolformer.py
379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 | |
mindcv.models.poolformer.poolformer_s12(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get poolformer_s12 model.
Refer to the base class models.PoolFormer for more details.
Source code in mindcv\models\poolformer.py
324 325 326 327 328 329 330 331 332 | |
mindcv.models.poolformer.poolformer_s24(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get poolformer_s24 model.
Refer to the base class models.PoolFormer for more details.
Source code in mindcv\models\poolformer.py
335 336 337 338 339 340 341 342 343 | |
mindcv.models.poolformer.poolformer_s36(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get poolformer_s36 model.
Refer to the base class models.PoolFormer for more details.
Source code in mindcv\models\poolformer.py
346 347 348 349 350 351 352 353 354 355 356 | |
pvt¶
mindcv.models.pvt
¶
MindSpore implementation of PVT.
Refer to PVT: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
mindcv.models.pvt.Attention
¶
Bases: Cell
spatial-reduction attention (SRA)
Source code in mindcv\models\pvt.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
mindcv.models.pvt.Block
¶
Bases: Cell
Block with spatial-reduction attention (SRA) and feed forward
Source code in mindcv\models\pvt.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
mindcv.models.pvt.PatchEmbed
¶
Bases: Cell
Image to Patch Embedding
Source code in mindcv\models\pvt.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
mindcv.models.pvt.PyramidVisionTransformer
¶
Bases: Cell
Pyramid Vision Transformer model class, based on
"Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions" <https://arxiv.org/abs/2102.12122>_ # noqa: E501
| PARAMETER | DESCRIPTION |
|---|---|
img_size(int) |
size of a input image.
|
patch_size |
size of a single image patch.
TYPE:
|
in_chans |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
embed_dims |
how many hidden dim in each PatchEmbed.
TYPE:
|
num_heads |
number of attention head in each stage.
TYPE:
|
mlp_ratios |
ratios of MLP hidden dims in each stage.
TYPE:
|
qkv_bias(bool) |
use bias in attention.
|
qk_scale(float) |
Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.
|
drop_rate(float) |
The drop rate for each block. Default: 0.0.
|
attn_drop_rate(float) |
The drop rate for attention. Default: 0.0.
|
drop_path_rate(float) |
The drop rate for drop path. Default: 0.0.
|
norm_layer(nn.Cell) |
Norm layer that will be used in blocks. Default: nn.LayerNorm.
|
depths |
number of Blocks.
TYPE:
|
sr_ratios(list) |
stride and kernel size of each attention.
|
num_stages(int) |
number of stage. Default: 4.
|
Source code in mindcv\models\pvt.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 | |
mindcv.models.pvt.pvt_large(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PVT large model Refer to the base class "models.PVT" for more details.
Source code in mindcv\models\pvt.py
413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 | |
mindcv.models.pvt.pvt_medium(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PVT medium model Refer to the base class "models.PVT" for more details.
Source code in mindcv\models\pvt.py
393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 | |
mindcv.models.pvt.pvt_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PVT small model Refer to the base class "models.PVT" for more details.
Source code in mindcv\models\pvt.py
373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 | |
mindcv.models.pvt.pvt_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PVT tiny model Refer to the base class "models.PVT" for more details.
Source code in mindcv\models\pvt.py
353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 | |
pvtv2¶
mindcv.models.pvtv2
¶
MindSpore implementation of PVTv2.
Refer to PVTv2: PVTv2: Improved Baselines with Pyramid Vision Transformer
mindcv.models.pvtv2.Attention
¶
Bases: Cell
Linear Spatial Reduction Attention
Source code in mindcv\models\pvtv2.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
mindcv.models.pvtv2.Block
¶
Bases: Cell
Block with Linear Spatial Reduction Attention and Convolutional Feed-Forward
Source code in mindcv\models\pvtv2.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
mindcv.models.pvtv2.DWConv
¶
Bases: Cell
Depthwise separable convolution
Source code in mindcv\models\pvtv2.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
mindcv.models.pvtv2.Mlp
¶
Bases: Cell
MLP with depthwise separable convolution
Source code in mindcv\models\pvtv2.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | |
mindcv.models.pvtv2.OverlapPatchEmbed
¶
Bases: Cell
Overlapping Patch Embedding
Source code in mindcv\models\pvtv2.py
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 | |
mindcv.models.pvtv2.PyramidVisionTransformerV2
¶
Bases: Cell
Pyramid Vision Transformer V2 model class, based on
"PVTv2: Improved Baselines with Pyramid Vision Transformer" <https://arxiv.org/abs/2106.13797>_
| PARAMETER | DESCRIPTION |
|---|---|
img_size(int) |
size of a input image.
|
patch_size |
size of a single image patch.
TYPE:
|
in_chans |
number the channels of the input. Default: 3.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
embed_dims |
how many hidden dim in each PatchEmbed.
TYPE:
|
num_heads |
number of attention head in each stage.
TYPE:
|
mlp_ratios |
ratios of MLP hidden dims in each stage.
TYPE:
|
qkv_bias(bool) |
use bias in attention.
|
qk_scale(float) |
Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.
|
drop_rate(float) |
The drop rate for each block. Default: 0.0.
|
attn_drop_rate(float) |
The drop rate for attention. Default: 0.0.
|
drop_path_rate(float) |
The drop rate for drop path. Default: 0.0.
|
norm_layer(nn.Cell) |
Norm layer that will be used in blocks. Default: nn.LayerNorm.
|
depths |
number of Blocks.
TYPE:
|
sr_ratios(list) |
stride and kernel size of each attention.
|
num_stages(int) |
number of stage. Default: 4.
|
linear(bool) |
use linear SRA.
|
Source code in mindcv\models\pvtv2.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 | |
mindcv.models.pvtv2.pvt_v2_b0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PVTV2-b0 model Refer to the base class "models.PVTv2" for more details.
Source code in mindcv\models\pvtv2.py
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | |
mindcv.models.pvtv2.pvt_v2_b1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PVTV2-b1 model Refer to the base class "models.PVTv2" for more details.
Source code in mindcv\models\pvtv2.py
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 | |
mindcv.models.pvtv2.pvt_v2_b2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PVTV2-b2 model Refer to the base class "models.PVTv2" for more details.
Source code in mindcv\models\pvtv2.py
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 | |
mindcv.models.pvtv2.pvt_v2_b3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PVTV2-b3 model Refer to the base class "models.PVTv2" for more details.
Source code in mindcv\models\pvtv2.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 | |
mindcv.models.pvtv2.pvt_v2_b4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PVTV2-b4 model Refer to the base class "models.PVTv2" for more details.
Source code in mindcv\models\pvtv2.py
424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 | |
mindcv.models.pvtv2.pvt_v2_b5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get PVTV2-b5 model Refer to the base class "models.PVTv2" for more details.
Source code in mindcv\models\pvtv2.py
442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 | |
regnet¶
mindcv.models.regnet
¶
MindSpore implementation of RegNet.
Refer to: Designing Network Design Spaces
mindcv.models.regnet.AnyHead
¶
Bases: Cell
AnyNet head: optional conv, AvgPool, 1x1.
Source code in mindcv\models\regnet.py
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | |
mindcv.models.regnet.AnyNet
¶
Bases: Cell
AnyNet model.
Source code in mindcv\models\regnet.py
354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 | |
mindcv.models.regnet.AnyStage
¶
Bases: Cell
AnyNet stage (sequence of blocks w/ the same output shape).
Source code in mindcv\models\regnet.py
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 | |
mindcv.models.regnet.BasicTransform
¶
Bases: Cell
Basic transformation: [3x3 conv, BN, Relu] x2.
Source code in mindcv\models\regnet.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
mindcv.models.regnet.BottleneckTransform
¶
Bases: Cell
Bottleneck transformation: 1x1, 3x3 [+SE], 1x1.
Source code in mindcv\models\regnet.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
mindcv.models.regnet.RegNet
¶
Bases: AnyNet
RegNet model class, based on
"Designing Network Design Spaces" <https://arxiv.org/abs/2003.13678>_
Source code in mindcv\models\regnet.py
469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 | |
mindcv.models.regnet.RegNet.regnet_get_params(w_a, w_0, w_m, d, stride, bot_mul, group_w, stem_type, stem_w, block_type, head_w, num_classes, se_r)
staticmethod
¶
Get AnyNet parameters that correspond to the RegNet.
Source code in mindcv\models\regnet.py
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 | |
mindcv.models.regnet.ResBasicBlock
¶
Bases: Cell
Residual basic block: x + f(x), f = basic transform.
Source code in mindcv\models\regnet.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | |
mindcv.models.regnet.ResBottleneckBlock
¶
Bases: Cell
Residual bottleneck block: x + f(x), f = bottleneck transform.
Source code in mindcv\models\regnet.py
262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 | |
mindcv.models.regnet.ResBottleneckLinearBlock
¶
Bases: Cell
Residual linear bottleneck block: x + f(x), f = bottleneck transform.
Source code in mindcv\models\regnet.py
279 280 281 282 283 284 285 286 287 288 | |
mindcv.models.regnet.ResStem
¶
Bases: Cell
ResNet stem for ImageNet: 7x7, BN, AF, MaxPool.
Source code in mindcv\models\regnet.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
mindcv.models.regnet.ResStemCifar
¶
Bases: Cell
ResNet stem for CIFAR: 3x3, BN, AF.
Source code in mindcv\models\regnet.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
mindcv.models.regnet.SimpleStem
¶
Bases: Cell
Simple stem for ImageNet: 3x3, BN, AF.
Source code in mindcv\models\regnet.py
154 155 156 157 158 159 160 161 162 163 164 165 166 167 | |
mindcv.models.regnet.VanillaBlock
¶
Bases: Cell
Vanilla block: [3x3 conv, BN, Relu] x2.
Source code in mindcv\models\regnet.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
mindcv.models.regnet.activation()
¶
Helper for building an activation layer.
Source code in mindcv\models\regnet.py
115 116 117 | |
mindcv.models.regnet.adjust_block_compatibility(ws, bs, gs)
¶
Adjusts the compatibility of widths, bottlenecks, and groups.
Source code in mindcv\models\regnet.py
427 428 429 430 431 432 433 434 435 436 437 438 | |
mindcv.models.regnet.conv2d(w_in, w_out, k, *, stride=1, groups=1, bias=False)
¶
Helper for building a conv2d layer.
Source code in mindcv\models\regnet.py
84 85 86 87 88 | |
mindcv.models.regnet.gap2d(keep_dims=False)
¶
Helper for building a gap2d layer.
Source code in mindcv\models\regnet.py
105 106 107 | |
mindcv.models.regnet.generate_regnet(w_a, w_0, w_m, d, q=8)
¶
Generates per stage widths and depths from RegNet parameters.
Source code in mindcv\models\regnet.py
441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 | |
mindcv.models.regnet.generate_regnet_full(w_a, w_0, w_m, d, stride, bot_mul, group_w)
¶
Generates per stage ws, ds, gs, bs, and ss from RegNet cfg.
Source code in mindcv\models\regnet.py
459 460 461 462 463 464 465 466 | |
mindcv.models.regnet.get_block_fun(block_type)
¶
Retrieves the block function by name.
Source code in mindcv\models\regnet.py
341 342 343 344 345 346 347 348 349 350 351 | |
mindcv.models.regnet.get_stem_fun(stem_type)
¶
Retrieves the stem function by name.
Source code in mindcv\models\regnet.py
329 330 331 332 333 334 335 336 337 338 | |
mindcv.models.regnet.linear(w_in, w_out, *, bias=False)
¶
Helper for building a linear layer.
Source code in mindcv\models\regnet.py
110 111 112 | |
mindcv.models.regnet.norm2d(w_in, eps=1e-05, mom=0.9)
¶
Helper for building a norm2d layer.
Source code in mindcv\models\regnet.py
91 92 93 | |
mindcv.models.regnet.pool2d(_w_in, k, *, stride=1)
¶
Helper for building a pool2d layer.
Source code in mindcv\models\regnet.py
96 97 98 99 100 101 102 | |
repmlp¶
mindcv.models.repmlp
¶
MindSpore implementation of RepMLPNet.
Refer to RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality.
mindcv.models.repmlp.FFNBlock
¶
Bases: Cell
Common FFN layer
Source code in mindcv\models\repmlp.py
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 | |
mindcv.models.repmlp.GlobalPerceptron
¶
Bases: Cell
GlobalPerceptron Layers provides global information(One of the three components of RepMLPBlock)
Source code in mindcv\models\repmlp.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
mindcv.models.repmlp.RepMLPBlock
¶
Bases: Cell
Basic RepMLPBlock Layer(compose of Global Perceptron, Channel Perceptron and Local Perceptron)
Source code in mindcv\models\repmlp.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | |
mindcv.models.repmlp.RepMLPNet
¶
Bases: Cell
RepMLPNet model class, based on
"RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality" <https://arxiv.org/pdf/2112.11081v2.pdf>_
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
number of input channels. Default: 3.
DEFAULT:
|
num_classes |
number of classification classes. Default: 1000.
|
patch_size |
size of a single image patch. Default: (4, 4)
DEFAULT:
|
num_blocks |
number of blocks per stage. Default: (2,2,6,2)
DEFAULT:
|
channels |
number of in_channels(channels[stage_idx]) and out_channels(channels[stage_idx + 1]) per stage. Default: (192,384,768,1536)
DEFAULT:
|
hs |
height of picture per stage. Default: (64,32,16,8)
DEFAULT:
|
ws |
width of picture per stage. Default: (64,32,16,8)
DEFAULT:
|
sharesets_nums |
number of share sets per stage. Default: (4,8,16,32)
DEFAULT:
|
reparam_conv_k |
convolution kernel size in local Perceptron. Default: (3,)
DEFAULT:
|
globalperceptron_reduce |
Intermediate convolution output size (in_channal = inchannal, out_channel = in_channel/globalperceptron_reduce) in globalperceptron. Default: 4
DEFAULT:
|
use_checkpoint |
whether to use checkpoint
DEFAULT:
|
deploy |
whether to use bias
DEFAULT:
|
Source code in mindcv\models\repmlp.py
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 | |
mindcv.models.repmlp.RepMLPNetUnit
¶
Bases: Cell
Basic unit of RepMLPNet
Source code in mindcv\models\repmlp.py
256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | |
mindcv.models.repmlp.repmlp_b224(pretrained=False, image_size=224, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶
Get repmlp_b224 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindcv\models\repmlp.py
418 419 420 421 422 423 424 425 426 427 428 429 430 431 | |
mindcv.models.repmlp.repmlp_b256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶
Get repmlp_b256 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindcv\models\repmlp.py
434 435 436 437 438 439 440 441 442 443 444 445 446 447 | |
mindcv.models.repmlp.repmlp_d256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶
Get repmlp_d256 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindcv\models\repmlp.py
450 451 452 453 454 455 456 457 458 459 460 461 462 463 | |
mindcv.models.repmlp.repmlp_l256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶
Get repmlp_l256 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindcv\models\repmlp.py
466 467 468 469 470 471 472 473 474 475 476 477 478 479 | |
mindcv.models.repmlp.repmlp_t224(pretrained=False, image_size=224, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶
Get repmlp_t224 model. Refer to the base class models.RepMLPNet for more details.
Source code in mindcv\models\repmlp.py
386 387 388 389 390 391 392 393 394 395 396 397 398 399 | |
mindcv.models.repmlp.repmlp_t256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)
¶
Get repmlp_t256 model.
Refer to the base class models.RepMLPNet for more details.
Source code in mindcv\models\repmlp.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 | |
repvgg¶
mindcv.models.repvgg
¶
MindSpore implementation of RepVGG.
Refer to RepVGG: Making VGG_style ConvNets Great Again
mindcv.models.repvgg.RepVGG
¶
Bases: Cell
RepVGG model class, based on
"RepVGGBlock: An all-MLP Architecture for Vision" <https://arxiv.org/pdf/2101.03697>_
| PARAMETER | DESCRIPTION |
|---|---|
num_blocks |
number of RepVGGBlocks
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
width_multiplier |
the numbers of MLP Architecture.
TYPE:
|
override_group_map |
the numbers of MLP Architecture.
TYPE:
|
deploy |
use rbr_reparam block or not. Default: False
TYPE:
|
use_se |
use se_block or not. Default: False
TYPE:
|
Source code in mindcv\models\repvgg.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 | |
mindcv.models.repvgg.RepVGGBlock
¶
Bases: Cell
Basic Block of RepVGG
Source code in mindcv\models\repvgg.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
mindcv.models.repvgg.RepVGGBlock.get_custom_l2()
¶
This may improve the accuracy and facilitates quantization in some cases.
Source code in mindcv\models\repvgg.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
mindcv.models.repvgg.RepVGGBlock.switch_to_deploy()
¶
Model_convert
Source code in mindcv\models\repvgg.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
mindcv.models.repvgg.repvgg_a0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[0.75, 0.75, 0.75, 2.5].
Refer to the base class models.RepVGG for more details.
Source code in mindcv\models\repvgg.py
298 299 300 301 302 303 304 305 306 | |
mindcv.models.repvgg.repvgg_a1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[1.0, 1.0, 1.0, 2.5].
Refer to the base class models.RepVGG for more details.
Source code in mindcv\models\repvgg.py
309 310 311 312 313 314 315 316 317 | |
mindcv.models.repvgg.repvgg_a2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[1.5, 1.5, 1.5, 2.75].
Refer to the base class models.RepVGG for more details.
Source code in mindcv\models\repvgg.py
320 321 322 323 324 325 326 327 328 | |
mindcv.models.repvgg.repvgg_b0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[1.0, 1.0, 1.0, 2.5].
Refer to the base class models.RepVGG for more details.
Source code in mindcv\models\repvgg.py
331 332 333 334 335 336 337 338 339 | |
mindcv.models.repvgg.repvgg_b1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.0, 2.0, 2.0, 4.0].
Refer to the base class models.RepVGG for more details.
Source code in mindcv\models\repvgg.py
342 343 344 345 346 347 348 349 350 | |
mindcv.models.repvgg.repvgg_b1g2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.0, 2.0, 2.0, 4.0].
Refer to the base class models.RepVGG for more details.
Source code in mindcv\models\repvgg.py
380 381 382 383 384 385 386 387 388 | |
mindcv.models.repvgg.repvgg_b1g4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.0, 2.0, 2.0, 4.0].
Refer to the base class models.RepVGG for more details.
Source code in mindcv\models\repvgg.py
391 392 393 394 395 396 397 398 399 | |
mindcv.models.repvgg.repvgg_b2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.5, 2.5, 2.5, 5.0].
Refer to the base class models.RepVGG for more details.
Source code in mindcv\models\repvgg.py
353 354 355 356 357 358 359 360 361 | |
mindcv.models.repvgg.repvgg_b2g4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.5, 2.5, 2.5, 5.0].
Refer to the base class models.RepVGG for more details.
Source code in mindcv\models\repvgg.py
402 403 404 405 406 407 408 409 410 | |
mindcv.models.repvgg.repvgg_b3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[3.0, 3.0, 3.0, 5.0].
Refer to the base class models.RepVGG for more details.
Source code in mindcv\models\repvgg.py
364 365 366 367 368 369 370 371 372 | |
mindcv.models.repvgg.repvgg_model_convert(model, save_path=None, do_copy=True)
¶
repvgg_model_convert
Source code in mindcv\models\repvgg.py
413 414 415 416 417 418 419 420 421 422 | |
res2net¶
mindcv.models.res2net
¶
MindSpore implementation of Res2Net.
Refer to Res2Net: A New Multi-scale Backbone Architecture.
mindcv.models.res2net.Res2Net
¶
Bases: Cell
Res2Net model class, based on
"Res2Net: A New Multi-scale Backbone Architecture" <https://arxiv.org/abs/1904.01169>_
| PARAMETER | DESCRIPTION |
|---|---|
block |
block of resnet.
TYPE:
|
layer_nums |
number of layers of each stage.
TYPE:
|
version |
variety of Res2Net, 'res2net' or 'res2net_v1b'. Default: 'res2net'.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
groups |
number of groups for group conv in blocks. Default: 1.
TYPE:
|
base_width |
base width of pre group hidden channel in blocks. Default: 26.
TYPE:
|
scale |
scale factor of Bottle2neck. Default: 4.
DEFAULT:
|
norm |
normalization layer in blocks. Default: None.
TYPE:
|
Source code in mindcv\models\res2net.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 | |
mindcv.models.res2net.res2net101(pretrained=False, num_classes=1001, in_channels=3, **kwargs)
¶
Get 101 layers Res2Net model.
Refer to the base class models.Res2Net for more details.
Source code in mindcv\models\res2net.py
326 327 328 329 330 331 332 333 334 335 336 337 | |
mindcv.models.res2net.res2net152(pretrained=False, num_classes=1001, in_channels=3, **kwargs)
¶
Get 152 layers Res2Net model.
Refer to the base class models.Res2Net for more details.
Source code in mindcv\models\res2net.py
340 341 342 343 344 345 346 347 348 349 350 351 | |
mindcv.models.res2net.res2net50(pretrained=False, num_classes=1001, in_channels=3, **kwargs)
¶
Get 50 layers Res2Net model.
Refer to the base class models.Res2Net for more details.
Source code in mindcv\models\res2net.py
312 313 314 315 316 317 318 319 320 321 322 323 | |
resnest¶
mindcv.models.resnest
¶
MindSpore implementation of ResNeSt.
Refer to ResNeSt: Split-Attention Networks.
mindcv.models.resnest.Bottleneck
¶
Bases: Cell
ResNeSt Bottleneck
Source code in mindcv\models\resnest.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
mindcv.models.resnest.ResNeSt
¶
Bases: Cell
ResNeSt model class, based on
"ResNeSt: Split-Attention Networks" <https://arxiv.org/abs/2004.08955>_
| PARAMETER | DESCRIPTION |
|---|---|
block |
Class for the residual block. Option is Bottleneck.
TYPE:
|
layers |
Numbers of layers in each block.
TYPE:
|
radix |
Number of groups for Split-Attention conv. Default: 1.
TYPE:
|
group |
Number of groups for the conv in each bottleneck block. Default: 1.
TYPE:
|
bottleneck_width |
bottleneck channels factor. Default: 64.
TYPE:
|
num_classes |
Number of classification classes. Default: 1000.
TYPE:
|
dilated |
Applying dilation strategy to pretrained ResNeSt yielding a stride-8 model, typically used in Semantic Segmentation. Default: False.
TYPE:
|
dilation |
Number of dilation in the conv. Default: 1.
TYPE:
|
deep_stem |
three 3x3 convolution layers of widths stem_width, stem_width, stem_width * 2. Default: False.
TYPE:
|
stem_width |
number of channels in stem convolutions. Default: 64.
TYPE:
|
avg_down |
use avg pooling for projection skip connection between stages/downsample. Default: False.
TYPE:
|
avd |
use avg pooling before or after split-attention conv. Default: False.
TYPE:
|
avd_first |
use avg pooling before or after split-attention conv. Default: False.
TYPE:
|
drop_rate |
Drop probability for the Dropout layer. Default: 0.
TYPE:
|
norm_layer |
Normalization layer used in backbone network. Default: nn.BatchNorm2d.
TYPE:
|
Source code in mindcv\models\resnest.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 | |
mindcv.models.resnest.SplitAttn
¶
Bases: Cell
Split-Attention Conv2d
Source code in mindcv\models\resnest.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | |
resnet¶
mindcv.models.resnet
¶
MindSpore implementation of ResNet.
Refer to Deep Residual Learning for Image Recognition.
mindcv.models.resnet.BasicBlock
¶
Bases: Cell
define the basic block of resnet
Source code in mindcv\models\resnet.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
mindcv.models.resnet.Bottleneck
¶
Bases: Cell
Bottleneck here places the stride for downsampling at 3x3 convolution(self.conv2) as torchvision does, while original implementation places the stride at the first 1x1 convolution(self.conv1)
Source code in mindcv\models\resnet.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
mindcv.models.resnet.ResNet
¶
Bases: Cell
ResNet model class, based on
"Deep Residual Learning for Image Recognition" <https://arxiv.org/abs/1512.03385>_
| PARAMETER | DESCRIPTION |
|---|---|
block |
block of resnet.
TYPE:
|
layers |
number of layers of each stage.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
groups |
number of groups for group conv in blocks. Default: 1.
TYPE:
|
base_width |
base width of pre group hidden channel in blocks. Default: 64.
TYPE:
|
norm |
normalization layer in blocks. Default: None.
TYPE:
|
Source code in mindcv\models\resnet.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | |
mindcv.models.resnet.ResNet.forward_features(x)
¶
Network forward feature extraction.
Source code in mindcv\models\resnet.py
280 281 282 283 284 285 286 287 288 289 290 291 | |
mindcv.models.resnet.resnet101(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 101 layers ResNet model.
Refer to the base class models.ResNet for more details.
Source code in mindcv\models\resnet.py
341 342 343 344 345 346 347 348 349 | |
mindcv.models.resnet.resnet152(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 152 layers ResNet model.
Refer to the base class models.ResNet for more details.
Source code in mindcv\models\resnet.py
352 353 354 355 356 357 358 359 360 | |
mindcv.models.resnet.resnet18(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 18 layers ResNet model.
Refer to the base class models.ResNet for more details.
Source code in mindcv\models\resnet.py
308 309 310 311 312 313 314 315 316 | |
mindcv.models.resnet.resnet34(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 34 layers ResNet model.
Refer to the base class models.ResNet for more details.
Source code in mindcv\models\resnet.py
319 320 321 322 323 324 325 326 327 | |
mindcv.models.resnet.resnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 50 layers ResNet model.
Refer to the base class models.ResNet for more details.
Source code in mindcv\models\resnet.py
330 331 332 333 334 335 336 337 338 | |
mindcv.models.resnet.resnext101_32x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 101 layers ResNeXt model with 32 groups of GPConv.
Refer to the base class models.ResNet for more details.
Source code in mindcv\models\resnet.py
374 375 376 377 378 379 380 381 382 | |
mindcv.models.resnet.resnext101_64x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 101 layers ResNeXt model with 64 groups of GPConv.
Refer to the base class models.ResNet for more details.
Source code in mindcv\models\resnet.py
385 386 387 388 389 390 391 392 393 | |
mindcv.models.resnet.resnext50_32x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 50 layers ResNeXt model with 32 groups of GPConv.
Refer to the base class models.ResNet for more details.
Source code in mindcv\models\resnet.py
363 364 365 366 367 368 369 370 371 | |
resnetv2¶
mindcv.models.resnetv2
¶
MindSpore implementation of ResNetV2.
Refer to Identity Mappings in Deep Residual Networks.
mindcv.models.resnetv2.resnetv2_101(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 101 layers ResNetV2 model.
Refer to the base class models.ResNet for more details.
Source code in mindcv\models\resnetv2.py
108 109 110 111 112 113 114 115 116 117 118 119 | |
mindcv.models.resnetv2.resnetv2_50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 50 layers ResNetV2 model.
Refer to the base class models.ResNet for more details.
Source code in mindcv\models\resnetv2.py
94 95 96 97 98 99 100 101 102 103 104 105 | |
rexnet¶
mindcv.models.rexnet
¶
MindSpore implementation of ReXNet.
Refer to ReXNet: Rethinking Channel Dimensions for Efficient Model Design.
mindcv.models.rexnet.LinearBottleneck
¶
Bases: Cell
LinearBottleneck
Source code in mindcv\models\rexnet.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
mindcv.models.rexnet.ReXNetV1
¶
Bases: Cell
ReXNet model class, based on
"Rethinking Channel Dimensions for Efficient Model Design" <https://arxiv.org/abs/2007.00992>_
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
number of the input channels. Default: 3.
TYPE:
|
fi_channels |
number of the final channels. Default: 180.
TYPE:
|
initial_channels |
initialize inplanes. Default: 16.
TYPE:
|
width_mult |
The ratio of the channel. Default: 1.0.
TYPE:
|
depth_mult |
The ratio of num_layers. Default: 1.0.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
use_se |
use SENet in LinearBottleneck. Default: True.
TYPE:
|
se_ratio |
(float): SENet reduction ratio. Default 1/12.
DEFAULT:
|
drop_rate |
dropout ratio. Default: 0.2.
TYPE:
|
ch_div |
divisible by ch_div. Default: 1.
TYPE:
|
act_layer |
activation function in ConvNormAct. Default: nn.SiLU.
TYPE:
|
dw_act_layer |
activation function after dw_conv. Default: nn.ReLU6.
TYPE:
|
cls_useconv |
use conv in classification. Default: False.
TYPE:
|
Source code in mindcv\models\rexnet.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 | |
mindcv.models.rexnet.rexnet_09(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ReXNet model with width multiplier of 0.9.
Refer to the base class models.ReXNetV1 for more details.
Source code in mindcv\models\rexnet.py
269 270 271 272 273 274 | |
mindcv.models.rexnet.rexnet_10(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ReXNet model with width multiplier of 1.0.
Refer to the base class models.ReXNetV1 for more details.
Source code in mindcv\models\rexnet.py
277 278 279 280 281 282 | |
mindcv.models.rexnet.rexnet_13(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ReXNet model with width multiplier of 1.3.
Refer to the base class models.ReXNetV1 for more details.
Source code in mindcv\models\rexnet.py
285 286 287 288 289 290 | |
mindcv.models.rexnet.rexnet_15(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ReXNet model with width multiplier of 1.5.
Refer to the base class models.ReXNetV1 for more details.
Source code in mindcv\models\rexnet.py
293 294 295 296 297 298 | |
mindcv.models.rexnet.rexnet_20(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ReXNet model with width multiplier of 2.0.
Refer to the base class models.ReXNetV1 for more details.
Source code in mindcv\models\rexnet.py
301 302 303 304 305 306 | |
senet¶
mindcv.models.senet
¶
MindSpore implementation of SENet.
Refer to Squeeze-and-Excitation Networks.
mindcv.models.senet.Bottleneck
¶
Bases: Cell
Define the base block class for [SEnet, SEResNet, SEResNext] bottlenecks
that implements construct method.
Source code in mindcv\models\senet.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
mindcv.models.senet.SEBottleneck
¶
Bases: Bottleneck
Define the Bottleneck for SENet154.
Source code in mindcv\models\senet.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | |
mindcv.models.senet.SENet
¶
Bases: Cell
SENet model class, based on
"Squeeze-and-Excitation Networks" <https://arxiv.org/abs/1709.01507>_
| PARAMETER | DESCRIPTION |
|---|---|
block |
block class of SENet.
TYPE:
|
layers |
Number of residual blocks for 4 layers.
TYPE:
|
group |
Number of groups for the conv in each bottleneck block.
TYPE:
|
reduction |
Reduction ratio for Squeeze-and-Excitation modules.
TYPE:
|
drop_rate |
Drop probability for the Dropout layer. Default: 0.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
inplanes |
Number of input channels for layer1. Default: 64.
TYPE:
|
input3x3 |
If
TYPE:
|
downsample_kernel_size |
Kernel size for downsampling convolutions. Default: 1.
TYPE:
|
downsample_padding |
Padding for downsampling convolutions. Default: 0.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
Source code in mindcv\models\senet.py
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 | |
mindcv.models.senet.SEResNeXtBottleneck
¶
Bases: Bottleneck
Define the ResNeXt bottleneck type C with a Squeeze-and-Excitation module.
Source code in mindcv\models\senet.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | |
mindcv.models.senet.SEResNetBlock
¶
Bases: Cell
Define the basic block of resnet with a Squeeze-and-Excitation module.
Source code in mindcv\models\senet.py
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | |
mindcv.models.senet.SEResNetBottleneck
¶
Bases: Bottleneck
Define the ResNet bottleneck with a Squeeze-and-Excitation module, and the latter is used in the torchvision implementation of ResNet.
Source code in mindcv\models\senet.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
shufflenetv1¶
mindcv.models.shufflenetv1
¶
MindSpore implementation of ShuffleNetV1.
Refer to ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
mindcv.models.shufflenetv1.GroupConv
¶
Bases: Cell
Group convolution operation.
Due to MindSpore doesn't support group conv in shufflenet, we need to define the group convolution manually, instead
of using the origin nn.Conv2d by changing the parameter group.
| PARAMETER | DESCRIPTION |
|---|---|
in_channels |
Input channels of feature map.
TYPE:
|
out_channels |
Output channels of feature map.
TYPE:
|
kernel_size |
Size of convolution kernel.
TYPE:
|
stride |
Stride size for the group convolution layer.
TYPE:
|
pad_mode |
Specifies padding mode.
TYPE:
|
pad |
The number of padding on the height and width directions of the input.
TYPE:
|
groups |
Splits filter into groups,
TYPE:
|
has_bias |
Whether the Conv2d layer has a bias parameter.
TYPE:
|
Source code in mindcv\models\shufflenetv1.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
mindcv.models.shufflenetv1.ShuffleNetV1
¶
Bases: Cell
ShuffleNetV1 model class, based on
"ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices" <https://arxiv.org/abs/1707.01083>_ # noqa: E501
| PARAMETER | DESCRIPTION |
|---|---|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number of input channels. Default: 3.
TYPE:
|
model_size |
scale factor which controls the number of channels. Default: '2.0x'.
TYPE:
|
group |
number of group for group convolution. Default: 3.
TYPE:
|
Source code in mindcv\models\shufflenetv1.py
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 | |
mindcv.models.shufflenetv1.ShuffleV1Block
¶
Bases: Cell
Basic block of ShuffleNetV1. 1x1 GC -> CS -> 3x3 DWC -> 1x1 GC
Source code in mindcv\models\shufflenetv1.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | |
mindcv.models.shufflenetv1.shufflenet_v1_g3_05(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV1 model with width scaled by 0.5 and 3 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindcv\models\shufflenetv1.py
287 288 289 290 291 292 293 294 295 296 297 298 | |
mindcv.models.shufflenetv1.shufflenet_v1_g3_10(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV1 model with width scaled by 1.0 and 3 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindcv\models\shufflenetv1.py
301 302 303 304 305 306 307 308 309 310 311 312 | |
mindcv.models.shufflenetv1.shufflenet_v1_g3_15(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV1 model with width scaled by 1.5 and 3 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindcv\models\shufflenetv1.py
315 316 317 318 319 320 321 322 323 324 325 326 | |
mindcv.models.shufflenetv1.shufflenet_v1_g3_20(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV1 model with width scaled by 2.0 and 3 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindcv\models\shufflenetv1.py
329 330 331 332 333 334 335 336 337 338 339 340 | |
mindcv.models.shufflenetv1.shufflenet_v1_g8_05(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV1 model with width scaled by 0.5 and 8 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindcv\models\shufflenetv1.py
343 344 345 346 347 348 349 350 351 352 353 354 | |
mindcv.models.shufflenetv1.shufflenet_v1_g8_10(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV1 model with width scaled by 1.0 and 8 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindcv\models\shufflenetv1.py
357 358 359 360 361 362 363 364 365 366 367 368 | |
mindcv.models.shufflenetv1.shufflenet_v1_g8_15(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV1 model with width scaled by 1.5 and 8 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindcv\models\shufflenetv1.py
371 372 373 374 375 376 377 378 379 380 381 382 | |
mindcv.models.shufflenetv1.shufflenet_v1_g8_20(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV1 model with width scaled by 2.0 and 8 groups of GPConv.
Refer to the base class models.ShuffleNetV1 for more details.
Source code in mindcv\models\shufflenetv1.py
385 386 387 388 389 390 391 392 393 394 395 396 | |
shufflenetv2¶
mindcv.models.shufflenetv2
¶
MindSpore implementation of ShuffleNetV2.
Refer to ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
mindcv.models.shufflenetv2.ShuffleNetV2
¶
Bases: Cell
ShuffleNetV2 model class, based on
"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design" <https://arxiv.org/abs/1807.11164>_
| PARAMETER | DESCRIPTION |
|---|---|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number of input channels. Default: 3.
TYPE:
|
model_size |
scale factor which controls the number of channels. Default: '1.5x'.
TYPE:
|
Source code in mindcv\models\shufflenetv2.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |
mindcv.models.shufflenetv2.ShuffleV2Block
¶
Bases: Cell
define the basic block of ShuffleV2
Source code in mindcv\models\shufflenetv2.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
mindcv.models.shufflenetv2.shufflenet_v2_x0_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV2 model with width scaled by 0.5.
Refer to the base class models.ShuffleNetV2 for more details.
Source code in mindcv\models\shufflenetv2.py
220 221 222 223 224 225 226 227 228 229 230 231 | |
mindcv.models.shufflenetv2.shufflenet_v2_x1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV2 model with width scaled by 1.0.
Refer to the base class models.ShuffleNetV2 for more details.
Source code in mindcv\models\shufflenetv2.py
234 235 236 237 238 239 240 241 242 243 244 245 | |
mindcv.models.shufflenetv2.shufflenet_v2_x1_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV2 model with width scaled by 1.5.
Refer to the base class models.ShuffleNetV2 for more details.
Source code in mindcv\models\shufflenetv2.py
248 249 250 251 252 253 254 255 256 257 258 259 | |
mindcv.models.shufflenetv2.shufflenet_v2_x2_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get ShuffleNetV2 model with width scaled by 2.0.
Refer to the base class models.ShuffleNetV2 for more details.
Source code in mindcv\models\shufflenetv2.py
262 263 264 265 266 267 268 269 270 271 272 273 | |
sknet¶
mindcv.models.sknet
¶
MindSpore implementation of SKNet.
Refer to Selective Kernel Networks.
mindcv.models.sknet.SKNet
¶
Bases: ResNet
SKNet model class, based on
"Selective Kernel Networks" <https://arxiv.org/abs/1903.06586>_
| PARAMETER | DESCRIPTION |
|---|---|
block |
block of sknet.
TYPE:
|
layers |
number of layers of each stage.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
groups |
number of groups for group conv in blocks. Default: 1.
TYPE:
|
base_width |
base width of pre group hidden channel in blocks. Default: 64.
TYPE:
|
norm |
normalization layer in blocks. Default: None.
TYPE:
|
sk_kwargs |
kwargs of selective kernel. Default: None.
TYPE:
|
Source code in mindcv\models\sknet.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 | |
mindcv.models.sknet.SelectiveKernelBasic
¶
Bases: Cell
build basic block of sknet
Source code in mindcv\models\sknet.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | |
mindcv.models.sknet.SelectiveKernelBottleneck
¶
Bases: Cell
build the bottleneck of the sknet
Source code in mindcv\models\sknet.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
mindcv.models.sknet.skresnet18(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 18 layers SKNet model.
Refer to the base class models.SKNet for more details.
Source code in mindcv\models\sknet.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 | |
mindcv.models.sknet.skresnet34(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 34 layers SKNet model.
Refer to the base class models.SKNet for more details.
Source code in mindcv\models\sknet.py
234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
mindcv.models.sknet.skresnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 50 layers SKNet model.
Refer to the base class models.SKNet for more details.
Source code in mindcv\models\sknet.py
250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
mindcv.models.sknet.skresnext50_32x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 50 layers SKNeXt model with 32 groups of GPConv.
Refer to the base class models.SKNet for more details.
Source code in mindcv\models\sknet.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 | |
squeezenet¶
mindcv.models.squeezenet
¶
MindSpore implementation of SqueezeNet.
Refer to SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.
mindcv.models.squeezenet.Fire
¶
Bases: Cell
define the basic block of squeezenet
Source code in mindcv\models\squeezenet.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
mindcv.models.squeezenet.SqueezeNet
¶
Bases: Cell
SqueezeNet model class, based on
"SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size" <https://arxiv.org/abs/1602.07360>_ # noqa: E501
.. note:: Important: In contrast to the other models the inception_v3 expects tensors with a size of N x 3 x 227 x 227, so ensure your images are sized accordingly.
| PARAMETER | DESCRIPTION |
|---|---|
version |
version of the architecture, '1_0' or '1_1'. Default: '1_0'.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
drop_rate |
dropout rate of the classifier. Default: 0.5.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
Source code in mindcv\models\squeezenet.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
mindcv.models.squeezenet.squeezenet1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get SqueezeNet model of version 1.0.
Refer to the base class models.SqueezeNet for more details.
Source code in mindcv\models\squeezenet.py
153 154 155 156 157 158 159 160 161 162 163 164 | |
mindcv.models.squeezenet.squeezenet1_1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get SqueezeNet model of version 1.1.
Refer to the base class models.SqueezeNet for more details.
Source code in mindcv\models\squeezenet.py
167 168 169 170 171 172 173 174 175 176 177 178 | |
swintransformer¶
mindcv.models.swintransformer
¶
Define SwinTransformer model
mindcv.models.swintransformer.BasicLayer
¶
Bases: Cell
A basic Swin Transformer layer for one stage.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
Number of input channels.
TYPE:
|
input_resolution |
Input resolution.
TYPE:
|
depth |
Number of blocks.
TYPE:
|
num_heads |
Number of attention heads.
TYPE:
|
window_size |
Local window size.
TYPE:
|
mlp_ratio |
Ratio of mlp hidden dim to embedding dim.
TYPE:
|
qkv_bias |
If True, add a learnable bias to query, key, value. Default: True
TYPE:
|
qk_scale |
Override default qk scale of head_dim ** -0.5 if set.
TYPE:
|
drop |
Dropout rate. Default: 0.0
TYPE:
|
attn_drop |
Attention dropout rate. Default: 0.0
TYPE:
|
drop_path |
Stochastic depth rate. Default: 0.0
TYPE:
|
norm_layer |
Normalization layer. Default: nn.LayerNorm
TYPE:
|
downsample |
Downsample layer at the end of the layer. Default: None
TYPE:
|
Source code in mindcv\models\swintransformer.py
446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 | |
mindcv.models.swintransformer.PatchEmbed
¶
Bases: Cell
Image to Patch Embedding
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
Image size. Default: 224.
TYPE:
|
patch_size |
Patch token size. Default: 4.
TYPE:
|
in_chans |
Number of input image channels. Default: 3.
TYPE:
|
embed_dim |
Number of linear projection output channels. Default: 96.
TYPE:
|
norm_layer |
Normalization layer. Default: None
TYPE:
|
Source code in mindcv\models\swintransformer.py
515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 | |
mindcv.models.swintransformer.PatchMerging
¶
Bases: Cell
Patch Merging Layer.
| PARAMETER | DESCRIPTION |
|---|---|
input_resolution |
Resolution of input feature.
TYPE:
|
dim |
Number of input channels.
TYPE:
|
norm_layer |
Normalization layer. Default: nn.LayerNorm
TYPE:
|
Source code in mindcv\models\swintransformer.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 | |
mindcv.models.swintransformer.PatchMerging.construct(x)
¶
Source code in mindcv\models\swintransformer.py
429 430 431 432 433 434 435 436 437 438 439 440 | |
mindcv.models.swintransformer.SwinTransformer
¶
Bases: Cell
SwinTransformer model class, based on
"Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" <https://arxiv.org/pdf/2103.14030>_
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
Input image size. Default 224
TYPE:
|
patch_size |
Patch size. Default: 4
TYPE:
|
in_chans |
Number of input image channels. Default: 3
TYPE:
|
num_classes |
Number of classes for classification head. Default: 1000
TYPE:
|
embed_dim |
Patch embedding dimension. Default: 96
TYPE:
|
depths |
Depth of each Swin Transformer layer.
TYPE:
|
num_heads |
Number of attention heads in different layers.
TYPE:
|
window_size |
Window size. Default: 7
TYPE:
|
mlp_ratio |
Ratio of mlp hidden dim to embedding dim. Default: 4
TYPE:
|
qkv_bias |
If True, add a learnable bias to query, key, value. Default: True
TYPE:
|
qk_scale |
Override default qk scale of head_dim ** -0.5 if set. Default: None
TYPE:
|
drop_rate |
Dropout rate. Default: 0
TYPE:
|
attn_drop_rate |
Attention dropout rate. Default: 0
TYPE:
|
drop_path_rate |
Stochastic depth rate. Default: 0.1
TYPE:
|
norm_layer |
Normalization layer. Default: nn.LayerNorm.
TYPE:
|
ape |
If True, add absolute position embedding to the patch embedding. Default: False
TYPE:
|
patch_norm |
If True, add normalization after patch embedding. Default: True
TYPE:
|
Source code in mindcv\models\swintransformer.py
567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 | |
mindcv.models.swintransformer.SwinTransformerBlock
¶
Bases: Cell
Swin Transformer Block.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
Number of input channels.
TYPE:
|
input_resolution |
Input resolution.
TYPE:
|
num_heads |
Number of attention heads.
TYPE:
|
window_size |
Window size.
TYPE:
|
shift_size |
Shift size for SW-MSA.
TYPE:
|
mlp_ratio |
Ratio of mlp hidden dim to embedding dim.
TYPE:
|
qkv_bias |
If True, add a learnable bias to query, key, value. Default: True
TYPE:
|
qk_scale |
Override default qk scale of head_dim ** -0.5 if set.
TYPE:
|
drop |
Dropout rate. Default: 0.0
TYPE:
|
attn_drop |
Attention dropout rate. Default: 0.0
TYPE:
|
drop_path |
Stochastic depth rate. Default: 0.0
TYPE:
|
act_layer |
Activation layer. Default: nn.GELU
TYPE:
|
norm_layer |
Normalization layer. Default: nn.LayerNorm
TYPE:
|
Source code in mindcv\models\swintransformer.py
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 | |
mindcv.models.swintransformer.WindowAttention
¶
Bases: Cell
Window based multi-head self attention (W-MSA) Cell with relative position bias. It supports both of shifted and non-shifted window.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
Number of input channels.
TYPE:
|
window_size |
The height and width of the window.
TYPE:
|
num_heads |
Number of attention heads.
TYPE:
|
qkv_bias |
If True, add a learnable bias to query, key, value. Default: True
TYPE:
|
qZk_scale |
Override default qk scale of head_dim ** -0.5 if set
TYPE:
|
attn_drop |
Dropout ratio of attention weight. Default: 0.0
TYPE:
|
proj_drop |
Dropout ratio of output. Default: 0.0
TYPE:
|
Source code in mindcv\models\swintransformer.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
mindcv.models.swintransformer.WindowAttention.construct(x, mask=None)
¶
| PARAMETER | DESCRIPTION |
|---|---|
x |
input features with shape of (num_windows*B, N, C)
TYPE:
|
mask |
(0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None
TYPE:
|
Source code in mindcv\models\swintransformer.py
214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
mindcv.models.swintransformer.WindowPartition
¶
Bases: Cell
Source code in mindcv\models\swintransformer.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
mindcv.models.swintransformer.WindowPartition.construct(x)
¶
| PARAMETER | DESCRIPTION |
|---|---|
x |
(b, h, w, c)
TYPE:
|
window_size |
window size
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
windows
|
Tensor(num_windows*b, window_size, window_size, c)
TYPE:
|
Source code in mindcv\models\swintransformer.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
mindcv.models.swintransformer.WindowReverse
¶
Bases: Cell
Source code in mindcv\models\swintransformer.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
mindcv.models.swintransformer.WindowReverse.construct(windows, window_size, h, w)
¶
| PARAMETER | DESCRIPTION |
|---|---|
windows |
(num_windows*B, window_size, window_size, C)
TYPE:
|
window_size |
Window size
TYPE:
|
h |
Height of image
TYPE:
|
w |
Width of image
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
x
|
(B, H, W, C)
TYPE:
|
Source code in mindcv\models\swintransformer.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
mindcv.models.swintransformer.swin_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get SwinTransformer tiny model. Refer to the base class 'models.SwinTransformer' for more details.
Source code in mindcv\models\swintransformer.py
699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 | |
mindcv.models.swintransformer.window_partition(x, window_size)
¶
| PARAMETER | DESCRIPTION |
|---|---|
x |
(B, H, W, C)
|
window_size |
window size
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
windows
|
numpy(num_windows*B, window_size, window_size, C) |
Source code in mindcv\models\swintransformer.py
65 66 67 68 69 70 71 72 73 74 75 76 77 | |
swintransformerv2¶
mindcv.models.swintransformerv2
¶
MindSpore implementation of SwinTransformer V2.
Refer to Swin Transformer V2: Scaling Up Capacity and Resolution.
mindcv.models.swintransformerv2.SwinTransformerV2
¶
Bases: Cell
SwinTransformerV2 model class, based on
"Swin Transformer V2: Scaling Up Capacity and Resolution" <https://arxiv.org/abs/2111.09883>_
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
Input image size. Default: 256.
TYPE:
|
patch_size |
Patch size. Default: 4.
TYPE:
|
in_channels |
Number the channels of the input. Default: 3.
TYPE:
|
num_classes |
Number of classification classes. Default: 1000.
TYPE:
|
embed_dim |
Patch embedding dimension. Default: 96.
TYPE:
|
depths |
Depth of each Swin Transformer layer. Default: [2, 2, 6, 2].
TYPE:
|
num_heads |
Number of attention heads in different layers. Default: [3, 6, 12, 24].
TYPE:
|
window_size |
Window size. Default: 7.
TYPE:
|
mlp_ratio |
Ratio of mlp hidden dim to embedding dim. Default: 4.
TYPE:
|
qkv_bias |
If True, add a bias for query, key, value. Default: True.
TYPE:
|
drop_rate |
Drop probability for the Dropout layer. Default: 0.
TYPE:
|
attn_drop_rate |
Attention drop probability for the Dropout layer. Default: 0.
TYPE:
|
drop_path_rate |
Stochastic depth rate. Default: 0.1.
TYPE:
|
norm_layer |
Normalization layer. Default: nn.LayerNorm.
TYPE:
|
patch_norm |
If True, add normalization after patch embedding. Default: True.
TYPE:
|
pretrained_window_sizes |
Pretrained window sizes of each layer. Default: [0, 0, 0, 0].
TYPE:
|
Source code in mindcv\models\swintransformerv2.py
520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 | |
vgg¶
mindcv.models.vgg
¶
MindSpore implementation of VGGNet.
Refer to SqueezeNet: Very Deep Convolutional Networks for Large-Scale Image Recognition.
mindcv.models.vgg.VGG
¶
Bases: Cell
VGGNet model class, based on
"Very Deep Convolutional Networks for Large-Scale Image Recognition" <https://arxiv.org/abs/1409.1556>_
| PARAMETER | DESCRIPTION |
|---|---|
model_name |
name of the architecture. 'vgg11', 'vgg13', 'vgg16' or 'vgg19'.
TYPE:
|
batch_norm |
use batch normalization or not. Default: False.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
in_channels |
number the channels of the input. Default: 3.
TYPE:
|
drop_rate |
dropout rate of the classifier. Default: 0.5.
TYPE:
|
Source code in mindcv\models\vgg.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
mindcv.models.vgg.vgg11(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 11 layers VGG model.
Refer to the base class models.VGG for more details.
Source code in mindcv\models\vgg.py
138 139 140 141 142 143 144 145 146 147 148 149 | |
mindcv.models.vgg.vgg13(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 13 layers VGG model.
Refer to the base class models.VGG for more details.
Source code in mindcv\models\vgg.py
152 153 154 155 156 157 158 159 160 161 162 163 | |
mindcv.models.vgg.vgg16(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 16 layers VGG model.
Refer to the base class models.VGG for more details.
Source code in mindcv\models\vgg.py
166 167 168 169 170 171 172 173 174 175 176 177 | |
mindcv.models.vgg.vgg19(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get 19 layers VGG model.
Refer to the base class models.VGG for more details.
Source code in mindcv\models\vgg.py
180 181 182 183 184 185 186 187 188 189 190 191 | |
visformer¶
mindcv.models.visformer
¶
MindSpore implementation of Visformer.
Refer to: Visformer: The Vision-friendly Transformer
mindcv.models.visformer.Attention
¶
Bases: Cell
Attention layer
Source code in mindcv\models\visformer.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
mindcv.models.visformer.Block
¶
Bases: Cell
visformer basic block
Source code in mindcv\models\visformer.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
mindcv.models.visformer.Mlp
¶
Bases: Cell
MLP layer
Source code in mindcv\models\visformer.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | |
mindcv.models.visformer.Visformer
¶
Bases: Cell
Visformer model class, based on '"Visformer: The Vision-friendly Transformer" https://arxiv.org/pdf/2104.12533.pdf'
| PARAMETER | DESCRIPTION |
|---|---|
image_size |
images input size. Default: 224.
TYPE:
|
number |
32.
TYPE:
|
num_classes |
number of classification classes. Default: 1000.
TYPE:
|
embed_dim |
embedding dimension in all head. Default: 384.
TYPE:
|
depth |
model block depth. Default: None.
TYPE:
|
num_heads |
number of heads. Default: None.
TYPE:
|
mlp_ratio |
ratio of hidden features in Mlp. Default: 4.
TYPE:
|
qkv_bias |
have bias in qkv layers or not. Default: False.
TYPE:
|
qk_scale |
Override default qk scale of head_dim ** -0.5 if set.
TYPE:
|
drop_rate |
dropout rate. Default: 0.
TYPE:
|
attn_drop_rate |
attention layers dropout rate. Default: 0.
TYPE:
|
drop_path_rate |
drop path rate. Default: 0.1.
TYPE:
|
attn_stage |
block will have a attention layer if value = '1' else not. Default: '1111'.
TYPE:
|
pos_embed |
position embedding. Default: True.
TYPE:
|
spatial_conv |
block will have a spatial convolution layer if value = '1' else not. Default: '1111'.
TYPE:
|
group |
convolution group. Default: 8.
TYPE:
|
pool |
if true will use global_pooling else not. Default: True.
TYPE:
|
conv_init |
if true will init convolution weights else not. Default: False.
DEFAULT:
|
Source code in mindcv\models\visformer.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 | |
mindcv.models.visformer.visformer_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get visformer small model. Refer to the base class 'models.visformer' for more details.
Source code in mindcv\models\visformer.py
468 469 470 471 472 473 474 475 476 477 478 479 | |
mindcv.models.visformer.visformer_small_v2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get visformer small2 model. Refer to the base class 'models.visformer' for more details.
Source code in mindcv\models\visformer.py
482 483 484 485 486 487 488 489 490 491 492 493 | |
mindcv.models.visformer.visformer_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get visformer tiny model. Refer to the base class 'models.visformer' for more details.
Source code in mindcv\models\visformer.py
439 440 441 442 443 444 445 446 447 448 449 450 451 | |
mindcv.models.visformer.visformer_tiny_v2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get visformer tiny2 model. Refer to the base class 'models.visformer' for more details.
Source code in mindcv\models\visformer.py
454 455 456 457 458 459 460 461 462 463 464 465 | |
vit¶
mindcv.models.vit
¶
ViT
mindcv.models.vit.Attention
¶
Bases: Cell
Attention layer implementation, Rearrange Input -> B x N x hidden size.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
The dimension of input features.
TYPE:
|
num_heads |
The number of attention heads. Default: 8.
TYPE:
|
qkv_bias |
Specifies whether the linear layer uses a bias vector. Default: True.
TYPE:
|
qk_norm |
Specifies whether to do normalization to q and k.
TYPE:
|
attn_drop |
The drop rate of attention, greater than 0 and less equal than 1. Default: 0.0.
TYPE:
|
proj_drop |
The drop rate of output, greater than 0 and less equal than 1. Default: 0.0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor, output tensor. |
Examples:
>>> ops = Attention(768, 12)
Source code in mindcv\models\vit.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
mindcv.models.vit.Block
¶
Bases: Cell
Transformer block implementation.
| PARAMETER | DESCRIPTION |
|---|---|
dim |
The dimension of embedding.
TYPE:
|
num_heads |
The number of attention heads.
TYPE:
|
qkv_bias |
Specifies whether the linear layer uses a bias vector. Default: True.
TYPE:
|
attn_drop |
The drop rate of attention, greater than 0 and less equal than 1. Default: 0.0.
TYPE:
|
proj_drop |
The drop rate of dense layer output, greater than 0 and less equal than 1. Default: 0.0.
TYPE:
|
mlp_ratio |
The ratio used to scale the input dimensions to obtain the dimensions of the hidden layer.
TYPE:
|
drop_path |
The drop rate for drop path. Default: 0.0.
TYPE:
|
act_layer |
Activation function which will be stacked on top of the normalization layer (if not None), otherwise on top of the conv layer. Default: nn.GELU.
TYPE:
|
norm_layer |
Norm layer that will be stacked on top of the convolution layer. Default: nn.LayerNorm.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor, output tensor. |
Examples:
>>> ops = TransformerEncoder(768, 12, 12, 3072)
Source code in mindcv\models\vit.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | |
mindcv.models.vit.LayerScale
¶
Bases: Cell
Layer scale, help ViT improve the training dynamic, allowing for the training of deeper high-capacity image transformers that benefit from depth
| PARAMETER | DESCRIPTION |
|---|---|
dim |
The output dimension of attnetion layer or mlp layer.
TYPE:
|
init_values |
The scale factor. Default: 1e-5.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor, output tensor. |
Examples:
>>> ops = LayerScale(768, 0.01)
Source code in mindcv\models\vit.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
mindcv.models.vit.VisionTransformer
¶
Bases: Cell
ViT encoder, which returns the feature encoded by transformer encoder.
Source code in mindcv\models\vit.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 | |
volo¶
mindcv.models.volo
¶
Vision OutLOoker (VOLO) implementation Modified from timm/models/vision_transformer.py
mindcv.models.volo.Attention
¶
Bases: Cell
Implementation of self-attention
Source code in mindcv\models\volo.py
253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 | |
mindcv.models.volo.ClassAttention
¶
Bases: Cell
Class attention layer from CaiT, see details in CaiT Class attention is the post stage in our VOLO, which is optional.
Source code in mindcv\models\volo.py
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 | |
mindcv.models.volo.ClassBlock
¶
Bases: Cell
Class attention block from CaiT, see details in CaiT We use two-layers class attention in our VOLO, which is optional.
Source code in mindcv\models\volo.py
389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 | |
mindcv.models.volo.Downsample
¶
Bases: Cell
Image to Patch Embedding, downsampling between stage1 and stage2
Source code in mindcv\models\volo.py
489 490 491 492 493 494 495 496 497 498 499 500 501 502 | |
mindcv.models.volo.Fold
¶
Bases: Cell
Source code in mindcv\models\volo.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
mindcv.models.volo.Fold.__init__(channels, output_size, kernel_size, dilation=1, padding=0, stride=1)
¶
Alternative implementation of fold layer via transposed convolution.
All parameters are same as "torch.nn.Fold" <https://pytorch.org/docs/stable/generated/torch.nn.Fold.html>,
except for the additional channels parameter. We need channels to calculate the pre-allocated memory
size of the convolution kernel.
:param channels: same as the C in the document of "torch.nn.Fold"
<https://pytorch.org/docs/stable/generated/torch.nn.Fold.html>
:type channels: int
Source code in mindcv\models\volo.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
mindcv.models.volo.Mlp
¶
Bases: Cell
Implementation of MLP
Source code in mindcv\models\volo.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 | |
mindcv.models.volo.OutlookAttention
¶
Bases: Cell
Implementation of outlook attention --dim: hidden dim --num_heads: number of heads --kernel_size: kernel size in each window for outlook attention return: token features after outlook attention
Source code in mindcv\models\volo.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
mindcv.models.volo.Outlooker
¶
Bases: Cell
Implementation of outlooker layer: which includes outlook attention + MLP Outlooker is the first stage in our VOLO --dim: hidden dim --num_heads: number of heads --mlp_ratio: mlp ratio --kernel_size: kernel size in each window for outlook attention return: outlooker layer
Source code in mindcv\models\volo.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
mindcv.models.volo.PatchEmbed
¶
Bases: Cell
Image to Patch Embedding. Different with ViT use 1 conv layer, we use 4 conv layers to do patch embedding
Source code in mindcv\models\volo.py
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 | |
mindcv.models.volo.Transformer
¶
Bases: Cell
Implementation of Transformer, Transformer is the second stage in our VOLO
Source code in mindcv\models\volo.py
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 | |
mindcv.models.volo.VOLO
¶
Bases: Cell
Vision Outlooker, the main class of our model --layers: [x,x,x,x], four blocks in two stages, the first block is outlooker, the other three are transformer, we set four blocks, which are easily applied to downstream tasks --img_size, --in_channels, --num_classes: these three are very easy to understand --patch_size: patch_size in outlook attention --stem_hidden_dim: hidden dim of patch embedding, d1-d4 is 64, d5 is 128 --embed_dims, --num_heads: embedding dim, number of heads in each block --downsamples: flags to apply downsampling or not --outlook_attention: flags to apply outlook attention or not --mlp_ratios, --qkv_bias, --qk_scale, --drop_rate: easy to undertand --attn_drop_rate, --drop_path_rate, --norm_layer: easy to undertand --post_layers: post layers like two class attention layers using [ca, ca], if yes, return_mean=False --return_mean: use mean of all feature tokens for classification, if yes, no class token --return_dense: use token labeling, details are here: https://github.com/zihangJiang/TokenLabeling --mix_token: mixing tokens as token labeling, details are here: https://github.com/zihangJiang/TokenLabeling --pooling_scale: pooling_scale=2 means we downsample 2x --out_kernel, --out_stride, --out_padding: kerner size, stride, and padding for outlook attention
Source code in mindcv\models\volo.py
550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 | |
mindcv.models.volo.get_block(block_type, **kargs)
¶
get block by name, specifically for class attention block in here
Source code in mindcv\models\volo.py
432 433 434 435 436 437 | |
mindcv.models.volo.outlooker_blocks(block_fn, index, dim, layers, num_heads=1, kernel_size=3, padding=1, stride=1, mlp_ratio=3.0, qkv_bias=False, qk_scale=None, attn_drop=0.0, drop_path_rate=0.0, **kwargs)
¶
generate outlooker layer in stage1 return: outlooker layers
Source code in mindcv\models\volo.py
505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 | |
mindcv.models.volo.transformer_blocks(block_fn, index, dim, layers, num_heads, mlp_ratio=3.0, qkv_bias=False, qk_scale=None, attn_drop=0, drop_path_rate=0.0, **kwargs)
¶
generate transformer layers in stage2 return: transformer layers
Source code in mindcv\models\volo.py
526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 | |
mindcv.models.volo.volo_d1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
VOLO-D1 model, Params: 27M --layers: [x,x,x,x], four blocks in two stages, the first stage(block) is outlooker, the other three blocks are transformer, we set four blocks, which are easily applied to downstream tasks --embed_dims, --num_heads,: embedding dim, number of heads in each block --downsamples: flags to apply downsampling or not in four blocks --outlook_attention: flags to apply outlook attention or not --mlp_ratios: mlp ratio in four blocks --post_layers: post layers like two class attention layers using [ca, ca] See detail for all args in the class VOLO()
Source code in mindcv\models\volo.py
745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 | |
mindcv.models.volo.volo_d2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
VOLO-D2 model, Params: 59M
Source code in mindcv\models\volo.py
779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 | |
mindcv.models.volo.volo_d3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
VOLO-D3 model, Params: 86M
Source code in mindcv\models\volo.py
802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 | |
mindcv.models.volo.volo_d4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
VOLO-D4 model, Params: 193M
Source code in mindcv\models\volo.py
825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 | |
mindcv.models.volo.volo_d5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
VOLO-D5 model, Params: 296M stem_hidden_dim=128, the dim in patch embedding is 128 for VOLO-D5
Source code in mindcv\models\volo.py
848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 | |
xcit¶
mindcv.models.xcit
¶
MindSpore implementation of XCiT Refer to: XCiT: Cross-Covariance Image Transformers
mindcv.models.xcit.ClassAttention
¶
Bases: Cell
Class Attention Layer as in CaiT https://arxiv.org/abs/2103.17239
Source code in mindcv\models\xcit.py
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
mindcv.models.xcit.ClassAttentionBlock
¶
Bases: Cell
Class Attention Layer as in CaiT https://arxiv.org/abs/2103.17239
Source code in mindcv\models\xcit.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | |
mindcv.models.xcit.ConvPatchEmbed
¶
Bases: Cell
Image to Patch Embedding using multiple convolutional layers
Source code in mindcv\models\xcit.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
mindcv.models.xcit.LPI
¶
Bases: Cell
Local Patch Interaction module that allows explicit communication between tokens in 3x3 windows to augment the implicit communcation performed by the block diagonal scatter attention. Implemented using 2 layers of separable 3x3 convolutions with GeLU and BatchNorm2d
Source code in mindcv\models\xcit.py
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | |
mindcv.models.xcit.PositionalEncodingFourier
¶
Bases: Cell
Positional encoding relying on a fourier kernel matching the one used in the "Attention is all of Need" paper. The implementation builds on DeTR code https://github.com/facebookresearch/detr/blob/master/models/position_encoding.py
Source code in mindcv\models\xcit.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | |
mindcv.models.xcit.XCA
¶
Bases: Cell
Cross-Covariance Attention (XCA) operation where the channels are updated using a weighted sum. The weights are obtained from the (softmax normalized) Cross-covariance matrix (Q^T K \in d_h \times d_h)
Source code in mindcv\models\xcit.py
274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 | |
mindcv.models.xcit.XCiT
¶
Bases: Cell
XCiT model class, based on
"XCiT: Cross-Covariance Image Transformers" <https://arxiv.org/abs/2106.09681>_
Args:
img_size (int, tuple): input image size
patch_size (int, tuple): patch size
in_chans (int): number of input channels
num_classes (int): number of classes for classification head
embed_dim (int): embedding dimension
depth (int): depth of transformer
num_heads (int): number of attention heads
mlp_ratio (int): ratio of mlp hidden dim to embedding dim
qkv_bias (bool): enable bias for qkv if True
qk_scale (float): override default qk scale of head_dim ** -0.5 if set
drop_rate (float): dropout rate
attn_drop_rate (float): attention dropout rate
drop_path_rate (float): stochastic depth rate
norm_layer: (nn.Module): normalization layer
cls_attn_layers: (int) Depth of Class attention layers
use_pos: (bool) whether to use positional encoding
eta: (float) layerscale initialization value
tokens_norm: (bool) Whether to normalize all tokens or just the cls_token in the CA
Source code in mindcv\models\xcit.py
354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 | |
mindcv.models.xcit.conv3x3(in_planes, out_planes, stride=1)
¶
3x3 convolution with padding
Source code in mindcv\models\xcit.py
92 93 94 95 96 97 98 99 | |
mindcv.models.xcit.xcit_tiny_12_p16_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)
¶
Get xcit_tiny_12_p16_224 model. Refer to the base class 'models.XCiT' for more details.
Source code in mindcv\models\xcit.py
478 479 480 481 482 483 484 485 486 487 488 489 490 491 | |